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Abstract 

The level -3 data products from the Sea- viewing Wide Field-of-view Sensor (SeaWiFS) are statistical data 
sets derived from level -2 data. Each data set will be based on a fixed global grid of equal-area bins that 
are approximately 9 x 9 km 2 . Statistics available for each bin include the sum and sum of squares of the 
natural logarithm of derived level -2 geophysical variables where sums are accumulated over a binning period. 
Operationally, products with binning periods of 1 day, 8 days, 1 month, and 1 year will be produced and archived. 
From these accumulated values and for each bin, estimates of the mean, standard deviation, median, and mode 
may be derived for each geophysical variable. This report contains two major parts: the first (Section 2) is 
intended as a users’ guide for level -3 SeaWiFS data products. It contains an overview of level -0 to level -3 data 
processing, a discussion of important statistical considerations when using level -3 data, and details of how to 
use the level -3 data. The second part (Section 3) presents a comparative statistical study of several binning 
algorithms based on CZCS and moored fluorometer data. The operational binning algorithms were selected 
based on the results of this study. 


1. INTRODUCTION 

The level -3 data processing stage is the first stage in 
which data from the Sea-viewing Wide Field-of-view Sen- 
sor (SeaWiFS) are spatially and temporally averaged. Prior 
to this stage, a standard set of geophysical variables will 
be derived for individual pixels. These level -2 variables in- 
clude chlorophyll concentration, a diffuse attenuation co- 
efficient, and water-leaving radiances in the visible bands 
of SeaWiFS. 

In generating level -3 data products, pixels containing 
valid level -2 data will be mapped to a fixed spatial grid 
whose resolution elements are 9 x 9 km 2 . These square 
grid elements or bins are arranged in rows beginning at 
the South Pole. Each row begins at 180° longitude and 
circumscribes the Earth at a given latitude. There are 
5,940,422 bins for each level -3 data set. Within each bin, 
statistics will be accumulated for time periods of 1 day, 8 
days (often referred to as the weekly product ), 1 month, 
and 1 year. There will be a global level -3 data product 
archived for each day, 8-day period, calendar month, and 
calendar year of the SeaWiFS mission. 

The level -3 data products may be used to derive the 
mean, standard deviation, and other statistical measures 
for the standard level -2 variables, and for certain other 
variables, such as primary productivity, which are func- 
tions of level-2 variables. The Coastal Zone Color Scan- 
ner (CZCS) North Atlantic monthly composite chlorophyll 
images (Esaias et al. 1986 and Feldman et al. 1989) are ex- 
amples of monthly means derived from level -3 CZCS data. 

The purpose of binning data is to create reduced- volume 
data sets appropriate for use in climate and basin-scale bio- 
geochemical models. By averaging data over time periods 
of several days or longer, problems of missing data can 
be overcome. Although temporal and spatial resolutions 
are reduced, compared with the level -2 data, the resulting 
smoothed level -3 means are effective in depicting seasonal 
patterns on regional and basin scales. 


There are important statistical considerations that in- 
volve the use of level -3 data. Users should be aware of 
these considerations, especially in situations where level -3 
data are used in models to derive other variables. For ex- 
ample, to use a mean chlorophyll concentration (level -3 
variable) in an algorithm to derive mean primary produc- 
tivity might result in significantly biased results. Recom- 
mended procedures for using level -3 variables in models 
are presented in this report. 

The remainder of this report is divided into two parts. 
The first part (Section 2) is intended to serve as a guide for 
users of level -3 data products. Section 2.1 is an overview of 
the processing from level-0 to level-3. Section 2.2 contains 
a discussion of the important statistical considerations in- 
volved in using level -3 data, and Section 2.3 provides the 
equations to be used to compute the mean, standard devia- 
tion, median, and mode of each level -3 variable. Equations 
for computing statistics of level -4 variables, derived from 
level-3 variables, are given in Section 2.4. 

The second part (Section 3) documents a statistical 
study based on CZCS data and moored fluorometer data 
which compared alternative binning algorithms. Results 
of this study were the basis for the selection of the binning 
algorithm used. Three color plates compare the results of 
alternative binning algorithms applied to seven represen- 
tative CZCS scenes. 

In addition, there are three appendices providing de- 
tails for statisticians and programmers who may wish to 
write codes to bin data. Appendix A explains the proce- 
dure used for mapping pixels to bins based on the center 
latitude and longitude of the pixel, and for determining the 
latitude and longitude coordinates of a bin. Appendix B 
contains details of the weighting scheme used for weight- 
ing data from different orbits (times). Appendix C con- 
tains three pseudocodes that reveal how data are accu- 
mulated spatially (Space Binner Code), temporally (Time 
Binner Code) and how means, standard deviations, and 
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other statistics are calculated from the binned data (Bin 
Data Interpreter Code). 

2. USERS’ GUIDE 

2.1 Overview of Data Processing 

As the name would suggest, the level of a data product 
refers to the amount of processing that has been applied 
to the data. Certain conventions have been adopted to 
describe the major levels of processing. 

2.1.1 Level-0 Data 

Data recorded on board the satellite and subsequently 
broadcast to ground receiving stations are called level -0 
data. Data broadcast directly (without being recorded) are 
also considered level -0 data. The recorded data provide 
either local area coverage (LAC) or global area coverage 
(GAC). This classification refers to the spatial resolution of 
the data. In SeaWiFS LAC data, the spatial resolution is 

1.1 km at nadir (directly beneath the satellite), and pixels 
are contiguous. 

The GAC data are comprised of individual pixels hav- 
ing the same spatial resolution as LAC data (1.1km), but 
the pixels are spaced at 4.4 km intervals. The GAC data 
are created on board the satellite by selecting every fourth 
pixel on every fourth scan line. This subsampling reduces 
the volume of data required to provide global coverage. 
A comparative study of alternative GAC sampling algo- 
rithms was reported by McClain et al. (1992). 

Only a limited amount of LAC data will be recorded 
on board SeaWiFS. However, LAC data will be contin- 
uously broadcast as high-resolution picture transmission 
(HRPT) data to sites around the world which operate li- 
censed ground-receiving stations. All HRPT data will be 
LAC data. 

2.1.2 Level-la Data 

The level -la products include the raw image data and 
all instrument and spacecraft telemetry, as in the level -0 
data, together with appended instrument calibration and 
navigation data. In addition, instrument telemetry and 
selected spacecraft telemetry are reformatted and also ap- 
pended. 

Approximately 40 minutes of contiguous level -1 data 
are produced on the daylight portion of each orbit. Op- 
erationally, this 40-minute swath may be subdivided into 
two or more level -1 scenes. The division may occur when 
the sensor tilt is changed, i.e., so each scene would nom- 
inally have a constant sensor tilt, or other criteria, e.g., 
maximum scan lines per scene, may dictate further subdi- 
visions of the swath. 

The level-la data can be used to calculate calibrated 
radiances in units of Wm -2 /zm -1 sr -1 in the 8 spectral 
bands of SeaWiFS. This radiance received at the satellite 


altitude is solar radiation backscattered from the Earth’s 
atmosphere, ocean, clouds and land. Water-leaving radi- 
ance (the signal of interest) usually comprises less than 
10% of the total signal. 

2.1.3 Level-2 Data 

Geophysical properties of the ocean and atmosphere 
derived from level -la data are considered level -2 data. 
Level -2 data correspond to the original pixel positions; 
there is no remapping. Each level -2 scene corresponds to 
a level -1 scene and vice versa; there is no change in the 
geographical coverage of each scene for operational prod- 
ucts. 

Before computing level -2 data, pixels are eliminated if 
they contain clouds, sun glint, or other abnormalities. For 
pixels that pass these screens, an atmospheric correction 
algorithm (Gordon et al. 1983 and Gordon and Castano 
1987) is applied to subtract the atmospheric scattering 
components from the total radiance, and thus derive the 
water-leaving radiances in bands 1-5. Then, bio-optical 
algorithms (Clark 1981 and Gordon and Morel 1983) are 
applied to the water-leaving radiances to derive in-water 
properties. 

Standard variables currently planned for computation 

are: 

L\vn{K) normalized water-leaving radiances in the 
bands i — 1-5, 

L a (A i) atmospheric aerosol radiances in the bands 
i = 6-8, 

r a (865) aerosol optical thickness at 865 nm (band 

8 ), 

PIG CZCS-like pigment concentration (mgm -3 ), 
CHL chlorophyll a concentration (mgm -3 ), and 

7^490 diffuse attenuation coefficient at 490 nm 

(m -1 ). 

2.1.4 Level-3 Data 

The level -3 data are statistical data products derived 
by binning level -2 GAC data. This is the first stage at 
which data are both spatially and temporally averaged. A 
level -3 product will be produced for each day, 8-day period 
(week), calendar month, and calendar year of the SeaWiFS 
mission. The 8-day periods are started from the first day 
of each calendar year. Thus, there will be 46 weeks per 
calendar year, with the last week having only 5 or 6 days 
instead of 8. 

Each data product will contain statistics derived by 
mapping level -2 data to a fixed global grid whose resolu- 
tion elements (called bins) are approximately 9 x 9 km 2 . 
The bins are arranged in rows beginning at 180° longitude 
and circumscribing the Earth eastward at a given latitude. 
There are 5,940,422 bins for each level -3 data product. Ap- 
pendix A contains details related to the gridding scheme, 
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and the precise areal coverage and geographic location of 
each bin. 

Statistical data provided with the level -3 data products 
will allow users to calculate the mean, standard deviation, 
median, and mode for each level -2 variable listed above. 
The procedures are described in Section 2.3, and pseu- 
docodes for programming implementation are detailed in 
Appendix C. 

In addition to level -2 variables, statistical data will also 
be provided for the ratio: 


calculated at each pixel in the level -2 data set (but not 
saved as a level -2 variable). This ratio, which appears in 
several primary productivity algorithms (Balch et al. 1992, 
Platt and Sathyendranath 1988, Eppley et al. 1985, Smith 
and Baker 1978, and Bannister 1974), may be regarded as 
the integral chlorophyll (units of mgm~ 2 ) integrated over 
the upper optical depth. The rationale for including this 
as a level-3 variable will be presented in Section 2.2. 

In addition to the level -3 data products, a number of 
standard level -3 image products will be produced. These 
will include standard mapped images, which are equirec- 
tangular projections of means derived from the level -3 sta- 
tistical data, and reduced resolution images intended for 
browsing purposes. 

2.1.5 Level-4 Data 

In this report, variables derived from level -3 data will 
be called level-4 variables. It is anticipated that level-3 
data will be used as input to biogeochemical models where 
the goal of the modeling is to estimate global fluxes of key 
elements such as carbon and nitrogen. In such applica- 
tions, it is important that the level -4 variable represent a 
spatial- temporal mean, e.g., the average daily, weekly, or 
monthly carbon flux. The practice of substituting means 
into models to produce spatial-temporal means can result 
in significantly biased results. This will be discussed fur- 
ther in Section 2.2. 

The methods used to produce the level -3 SeaWiFS data 
have been designed to overcome this problem for a large 
class of level -4 variables. Procedures for computing un- 
biased estimates of the mean of level -4 variables will be 
discussed in detail in the following sections. 

2.2 Statistical Considerations 

The question of how to bin SeaWiFS data revolved 
around certain statistical issues. Many of the issues or 
questions raised had come to light through the experience 
of binning CZCS data into daily, monthly, and yearly com- 
posites. There were several proposed ways to average data, 
and results would be significantly different depending on 
the method chosen. It was further recognized that the 


choice of method should depend on how level -3 SeaWiFS 
data are to be used. The practice of using level -3 means in 
equations to derive level-4 means was inappropriate, and, 
therefore, this issue had to be addressed as well. 

Following is a discussion of four major issues and the 
summary of the decisions related to each. In many in- 
stances, decisions were based on a statistical analysis of 
CZCS data and moored fluorometer time-series data. The 
results of the statistical study are presented in Section 3. 
The four issues were: 

1. Should statistics be computed for CHL or for 
log(CHL)? What about other level-2 variables? 

2. What is the best method for estimating level -4 
variables? 

3. What statistics should be saved for each sam- 
pling domain? 

4. Should the temporal statistics give equal weight 
to all data falling within the sampling domain? 

Or, should some accommodation be made to 
compensate for the uneven temporal distribu- 
tion of data? 

2.2.1 CHL vs. log(CHL) Statistics 

Chlorophyll measurements tend to be lognormally dis- 
tributed, i.e., log(CHL) is normally distributed, in large 
data sets of satellite or ship data (Fig. 1). Lognormal dis- 
tributions occur commonly in biological processes where 
the rate of change of a variable is proportional to its size 
(Aitchison and Brown 1957 and Crow and Shimizu 1988). 
One of the first issues addressed, therefore, was whether or 
not statistics should be computed for CHL or for log(CHL). 
The same question was also addressed for other varia- 
bles. 

It is fairly common practice to log-transform CHL mea- 
surements before using them in other derivations. For ex- 
ample, Chelton and Schlax (1991) used log-transformed 
data in comparing time averages of chlorophyll data. The 
CZCS pigment algorithm was derived by a linear regression 
of log(CHL) versus log-transformed radiance ratios, and 
CZCS pigment images are usually scaled according to the 
logarithm of pigment. The mean derived by first averaging 
log-transformed data and then inverting the transform is 
the geometric mean. Is the geometric mean preferable to 
the arithmetic mean? 

It was agreed at the outset that the arithmetic mean 
is the appropriate mean for most biogeochemical applica- 
tions. The mean chlorophyll concentration, for example, 
represents the mean biomass per unit volume which will 
subsequently be multiplied by total volume (depth x area) 
to estimate regional or global biomass. However, the sam- 
ple mean derived from small samples might be a poor es- 
timator of the true population mean. 

Let X_be a lognormally distributed variable (Fig. 2), 
and let X denote the true mean of X within a sampling 
domain. In the context of the SeaWiFS data processing, 
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Fig. 1 . Histograms of chlorophyll concentration derived from in situ measurements. The top panel displays 
11,176 measurements from the world ocean collected by C.S. Yentsch, 1956-86. The bottom panel displays 
1,047 surface measurements from the northwest Atlantic continental shelf, Marine Resources Monitoring, 
Assessment, and Prediction (MARMAP), 1978-82. (Campbell and O’Reilly 1988) 
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Fig. 2. The lognormal distribution: The top panel displays a histogram of log(X), where log(X) is normally 
distributed with mean 0 and standard deviation 0.4. The bottom panel shows the corresponding histogram 
of the lognormal variable, X. 
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“sampling domain” refers to a specific bin and averaging 
period; X is the level-2 variable, and X its level-3 equiv- 
alent. The question is: what is the best method for es- 
timating X given a sample of n measurements (pixels): 

Xi ,..■,*»? 

In the case of a lognormal distribution, the sample 
mean (or arithmetic average): 

*avg = ( 2 ) 

n f — i 
i— 1 

tends to underestimate the true population mean when 
sample sizes are small (Baker and Gibson 1987). The 
higher the variance of the underlying distribution, the more 
this is true. The reason for this is that small samples tend 
to miss high values which occur much less frequently than 
low values. However, the high values have a significant 
influence on the mean of the distribution. For example, 
much of the biological production in the ocean occurs in 
localized areas such as upwelling zones, and in transient 
blooms of relatively short duration. A sample that misses 
these areas and blooms would significantly underestimate 
global or regional production. 

Sample sizes involved in binning GAC data will be 
small. Since the GAC data have a 4 km spacing between 
pixels, at most 9 pixels from a single orbital pass can fall 
into an 9 x 9 km bin. The average sample size will be 
closer to four in data sets derived from a single orbital pass. 
Although sample sizes will increase with longer averaging 
periods, the variance will also increase. Thus, there was 
concern that small sample sizes and large variances might 
make the arithmetic average a poor estimator for level -3 
means. 

The practice of transforming data first, computing the 
mean, m x , of log-transformed data 

= £ !>(*<) ( 3 ) 

i= 1 

and then estimating the mean of X as 

ATgeom = e x (4) 

gives the geometric mean. In the case of a lognormal vari- 
able, the geometric mean is the median of the distribution. 
For any distribution that is positively skewed, the geomet- 
ric mean will underestimate the population mean. 

Studies have shown that the maximum likelihood esti- 
mator for a lognormal mean 

*mle =e( m * + * S ') (5) 

performs better than either of the other two when variances 
are large and sample sizes small (Baker and Gibson 1987). 


In (5), m x is the sample mean of ln(X), given by (3), and 
s\ is the sample variance given by 

4 = ^S[ ln ( x »)-™*] 2 - ( fi ) 

i=l 

Note that this is not the more commonly used unbiased es- 
timator which uses a divisor of n — 1 instead of n. However, 
this is the maximum likelihood estimator for the variance 
of a normal random variable. In order for (5) to be the 
maximum likelihood estimator for X, m x and must be 
maximum likelihood estimators for the mean and variance 
of ln(X) (Crow and Shimizu 1988). 

In the statistical study presented in Section 3, the three 
estimators, X avg , X geom , and X m \ e , were compared using 
CZCS data and a time series of moored fluorometer data 
(Medeiros and Wirick 1992). Results obtained for both 
time and space averages were: 

1. The sample mean, X avg (2), and the maximum 
likelihood estimator, X m i e (5), gave equivalent 
results. 

2. The geometric mean or median, X geom (4), was 
systematically less than the other two. 

The same results were obtained for other standard CZCS 
variables: K^o and normalized water-leaving radiances 
L\vn{ K)- Thus, based on their performance as estimators 
of the mean, X avg and X m i e were regarded as acceptable 
estimators for the true population mean, X. 

2.2.2 Estimating Level-4 Variables 

It is not possible to prescribe a general method for es- 
timating level -4 variables. The appropriate method will 
depend on the nature of the relationship involved, i.e. , 
whether it is linear or nonlinear, and the form it takes. 

Let Y = f(X) be a relationship that defines the vari- 
able Y as a function of the level-2 variable X , and let Y be 
the level-3 equivalent of Y. That is, Y represents the true 
mean of Y within a sampling domain. In general, X may 
be a vector of level -2 variables, i.e., Y may be a function 
of more than one level -2 variable. 

The problem that motivates this issue is that Y is not, 
in general, equal to f(X). Substitution of the mean of X 
into the function is only legitimate for linear functions . In 
general , the mean of a function of several variables is not 
equal to the function of the means. 

For any general function, the only way to obtain an 
accurate estimate of the true mean, Y, would be to com- 
pute Yi = f(Xi) at each pixel in the level -2 data, and then 
determine its average using either the arithmetic average, 
Y avgj or the maximum likelihood estimate, F m i e . In this 
case, the function Y = f(X) would be a level-3 variable 
computed by averaging over pixels in the level -2 data. An 
example is IC K (1) which will be computed in this way. 
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It is not possible or practical to anticipate the many 
functions or mathematical relationships that may be ap- 
plied to SeaWiFS data. Thus, there needed to be guide- 
lines and methods for using level -3 data to obtain accurate 
estimates of the mean of level -4 variables. 

The decision was made to use the maximum likeli- 
hood estimation (MLE) method instead of the more com- 
mon arithmetic average (AVG) method because the MLE 
method provides a way to estimate the mean (and other 
statistics) for a large class of level -4 variables of the form 

Y = AX B (7) 

where A and B are constants, and X is a single variable, 
i.e., not a vector. 

For variables in this class, ln(Y) is linearly related to 

InPO 

ln(K) - In(4) + Bln{X). (8) 

Therefore, the mean and variance of ln(Y) can be esti- 
mated as 

rriy — ln(A) + Bm x (9) 

and 

s 2 y = B 2 sl (10) 

where m x and s 2 are the mean and variance of ln(X) de- 
rived from the level -3 statistics saved for X. 

According to the MLE method, the mean of Y is then 
given by 

F mle = e( m * + K). (11) 

It should be noted that if (5) proves to be an accurate esti- 
mator for the mean of X, then (11) will be an accurate esti- 
mator for the mean of Y. There is no loss of accuracy since 
(8)-(10) are exact relationships (not approximations). 

The procedures for estimating the variance and other 
statistics of level -3 and level -4 variables are described in 
more detail in Sections 2.3 and 2.4 and in Appendix C. The 
equations used are based on MLE methods for estimating 
parameters of a lognormal distribution, and hence, they 
are referred to as MLE estimators. As will be shown, the 
MLE estimator is a robust estimator for the mean. That 
is, it generally performs well even when the underlying dis- 
tribution is not lognormal. Indeed, the MLE method was 
not selected on the basis of an assumed lognormal distri- 
bution, but because it performed well compared with the 
arithmetic average (AVG estimator), and because it pro- 
vided a method for estimating the mean of level -4 variables 
of the form given by (7). 

An example of such a function is the euphotic depth, 
which is commonly defined as the 1% light-penetration 
depth (Kirk 1983). Using the level -2 variable AT490 and 
applying Beer’s Law, this depth may be defined as 


which represents the 1% light- penetration depth at A = 
490 nm. If the mean of if 490 based on level -3 data is used 
to estimate the mean euphotic depth, this will yield a bi- 
ased estimate of the mean euphotic depth. However, the 
MLE method allows for an accurate estimate of the mean 
Z e based on the saved statistics of ln(if49o) . 

The equations proposed by Morel and Berthon (1989) 
for deriving integral euphotic chlorophyll, (Chl) to t 5 from 
satellite-derived chlorophyll (or pigment) also take the form 
of (7). Several algorithms for estimating integral produc- 
tivity (Smith et al. 1982, Platt 1986, and Morel and Ber- 
thon 1989) involve the product of (Chl) to t and photosyn- 
thetically available radiation (PAR) at the surface, PAR(O). 
The mean of this product can be derived as the product of 
the means of (Chl) to t and PAR(O) since the two variables 
are uncorrelated. Thus, these algorithms may be applied 
to level-3 data using the saved statistics of standard level -2 
variables. 

2.2.3 Statistics Saved for Each Domain 

Another issue that was raised concerned the choice of 
statistics to save for each sampling domain. Given that 
X m ie (5) is to be used for estimating the mean of the 
level -2 data in each domain, the statistics saved must in- 
clude the sum and sum of squares of the natural logarithm 
of each variable. In addition, counts of the number of pixels 
contributing to the sums and similar ancillary information 
should also be saved. 

Beyond this, further questions regarding what statis- 
tics to save are motivated by the concern expressed earlier 
as to how level-4 variables will be estimated. Two alter- 
natives exist: either a) sufficient information is provided 
in the level -3 data to allow estimation of these variables 
using saved statistics of other variables or b) the variables 
should be computed at each pixel of level -2 data and their 
statistics saved as part of the level -3 data set. The latter 
is more costly from the standpoint of the storage required 
to add additional level -3 variables. As stated earlier, the 
MLE method permits the former choice for variables of the 
form given in (7). 

There are other level -4 variables which cannot be calcu- 
lated using only the saved statistics of the standard level -2 
variables. Any variable that is a function of two or more 
level -2 variables would require additional information on 
the covariances between level-2 variables. An example of 
this is the variable IC K (1) which appears in several pri- 
mary productivity algorithms (Balch et al. 1992, Platt and 
Sathyendranath 1988, Eppley et al. 1985, Smith and Baker 
1978, and Bannister 1974). To apply the MLE method, 
one must estimate the mean and variance of the natural 
logarithm of IC K 

ln(IC K ) = ln(CHL) - ln(X 49 o). (13) 


ln(0.01) 

A^490 


(12) 


The mean of ln(lC K ) is simply the difference between 
the means of ln(CHL) and ln(X49o), but the variance of 
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ln(lC K ): 

var [ln(lC K )] = var [ln(CHL)] 4- var [ln(AT 4 9o)] 
— 2 cov ^ln (CHL) , In (J'Qgo) j 


(14) 


involves the covariance, denoted by “cov” in (14), between 
ln(CHL) and ln(if49o), as well as their variances. The 
CZCS algorithms for CHL and #490 in Section 3 resulted 
in a nonlinear relationship between ln(CHL) and ln(if49o). 
Thus, their covariance varied from sample to sample. For 
this reason, it was decided to compute the variable IC K 
(1) at each pixel in the level -2 data and save statistics of 
ln(lC K ) as part of the level -3 data. 


2.2.4 Weighting of Temporal Statistics 

After each level -2 scene is generated, valid level -2 data 
from individual pixels will be binned. Sums and sums 
of squares accumulated at this stage are called spatial 
statistics, i.e., no temporal averaging is involved since data 
from the same scene are regarded as simultaneous. Spatial 
statistics from the same day will be combined into daily 
products, from the same 8-day period into weekly prod- 
ucts, and so forth. The daily, weekly, monthly, and longer- 
term products will become the level -3 data, and the spatial 
statistics pertaining to individual scenes will be discarded. 

On a given day, there may be two sets of spatial statis- 
tics for the same bin. Two sets might occur within the 
same orbit on different tilt segments, i.e., before and af- 
ter a change in the sensor’s tilt, or from different orbits 
in high-latitude areas where swaths overlap. In the case 
of two sets from the same orbit, only one set will be used. 
The set having the better sun-target viewing geometry will 
be selected. However, two sets of spatial statistics from 
different orbits will receive the same treatment as spatial 
statistics from different days. The same algorithms, called 
temporal binning algorithms , will be used to combine data 
separated by time gaps regardless of the size of the time 
gap. 

Let N be the number of sets of spatial statistics (or- 
bits) contributing to a temporal mean; let ti be the time at 
which the zth set was acquired; and let n* be the number 
of pixels contributing to the zth set, where i = 1, . . . , N. In 
considering the temporal binning algorithms, a major con- 
cern was the fact that the times are unevenly distributed, 
and that the sample size (hence precision) varies from one 
time to another. Samples sizes will vary between 1 and 
9, depending on where the bin lies relative to the ground 
track. Time gaps occur because of clouds, sunglint, and 
other factors. 

The methods used to compensate for unevenly distribu- 
ted data generally involve a scheme for weighting data. 
The alternative is to use simple composite statistics (un- 
weighted data), which was the method used to create 
level - 3 CZCS data such as the North Atlantic monthly 


composites (Feldman et al. 1989 and Esaias et al. 1986). 
These monthly composites have served as useful products 
for a number of scientific investigations (Campbell and 
Aarup 1992, Yentsch 1990, and Lewis et al. 1988), but 
some of the spatial patchiness in these data sets is an ar- 
tifact of the uneven temporal distribution of data. 

Chelton and Schlax (1991) have made a strong case for 
the superiority of optimal interpolation methods as com- 
pared to simple composite averages for deriving temporal 
means of irregularly spaced data. Such methods, known 
as kriging in the geostatistics literature (Journal 1989), re- 
quire the use of correlation functions which must be deter- 
mined a priori. When applied to satellite data, the meth- 
ods could require both temporal and spatial correlation 
functions. 

The advantage of optimal interpolation methods is that 
they allow estimates to be based on data that lie out- 
side the domain (bin and time interval) being estimated. 
The disadvantage is their computational complexity. Data 
must be deseasonalized before applying the optimal inter- 
polation method. That is, seasonal trends must be esti- 
mated and subtracted from the data. Therefore, at least 
a year of data must be collected before optimal interpola- 
tion methods can be applied. This is not compatible with 
the plan to generate level -3 data products along with the 
level -2 data processing. 

It was decided not to apply optimal interpolation meth- 
ods in the level “3 binning process. However, the binned 
statistics will be useful in applying optimal interpolation 
methods during post-processing. As an example, daily 
composite statistics might be used in deriving weekly and 
monthly means using optimal interpolation methods. 

The question was, therefore, whether to use simple 
composite statistics (all data within a given domain are 
given equal weight) or to develop a weighting scheme that 
could be implemented easily at the time the level -2 data 
are processed. In general, a decision to use weighted ver- 
sus unweighted statistics should depend on the distribution 
of the data vis-a-vis any trends that might exist. Simple 
unweighted statistics are recommended in the case where 
there is no trend (either spatial or temporal), or where the 
trend is impractical to estimate. The latter is the case for 
the spatial statistics. These will be unweighted sums and 
sums of squares of the pixels falling within each bin be- 
cause it is impractical to estimate spatial trends for each 
bin. 

In the case of weekly and monthly statistics, there may 
be significant trends that call for weighted sums. If sim- 
ple composite (unweighted) statistics are used, each of the 
N sets of spatial statistics will, in effect, be weighted by 
its sample size, n*. Thus, for example, a data set having 
m = 9 would be much more heavily weighted than one 
with Ui = 1. Trends may be lost in this process. Alterna- 
tively, a temporal mean might be calculated as the average 
of N spatial means, regardless of the number of pixels con- 
tributing to the spatial means. However, this would give 
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too much weight to a data set with rii = 1 compared to one 
with rii = 9. This concern reflects the belief that precision 
is a function of sample size. 

As a compromise to these two alternative approaches, 
it was decided to apply a weight of y/nl to the spatial mean 
at time ti , where rii is the number of pixels falling in the 
bin at time ti. This is effected by applying the weight 

= 75 (15) 

to the sums and sums of squares associated with the spatial 
statistics for time t*. Details of the weighting scheme are 
given in Appendix B. 


2.3 Protocols for Level "3 Statistics 

The level -3 data products available for each day, week , 
month, and year of the SeaWiFS mission will allow users to 
compute the mean, standard deviation, median, and mode 
of each level -3 variable in each bin. The level -3 variables 
consist of level -2 variables, and in addition, the variable 

IC K (1). 

For each level -3 variable A, the level -3 data consists 
of a pair of sums for each bin 


N - r\i 


JY -J 

£ £ ln (^y) 

»=i v n ‘ j= 1 

(16) 


(17) 


where Xij is the j th observation of X at time ti. Each 
observation corresponds to a pixel in the level -2 data. The 
number n* is the number of pixels at time ti containing 
valid level -2 data. 

In addition, the following statistics are saved for each 
bin: 

b bin index number (range: 1, . . . , 5, 940, 422), 

N total number of orbits contributing data, 
n total number of pixels contributing data, and 
W sum of weights. 

For the latter two quantities, their formulation is as follows: 


and 


N 

£ ni 

i=l 

(18) 

N 

£v^- 

i=l 

(19) 


In addition to the above variables, there will be a 16-bit 
time distribution variable T whose bits indicate whether 


data were available (bit = 1) or absent (bit = 0) in time in- 
tervals (days, two-day intervals, or months) covered by the 
averaging period. That is, each bit of the 16-bit number 
represents a time interval within the averaging period, and 
if a bit is set to 1, it indicates data were available during 
that interval. 


2.3.1 The Mean and Variance of ln(A) 

To estimate statistics for the variable A, the first step 
is to calculate the mean and variance of ln(A). These are 
given by 


and 


m x 


Si 

W 



( 20 ) 

( 21 ) 


2.3.2 The Mean and Other Statistics of X 

The mean of X is estimated by 


X mle = (22) 

and the standard deviation by 

SD X = X tn i e \J e s * - 1. (23) 


The median or geometric mean may be estimated by 


A me d = e m * .(24) 

and the mode (most frequent value) by 

A mod - (25) 


The above equations are based on the MLE method 
which was demonstrated to be valid for means of CZCS 
data and moored fluorometer data. Equations (22)-(25) 
are based on an assumed lognormal distribution of A with- 
in the sampling domain. For a discussion of the underly- 
ing assumptions and robustness of the estimators see Sec- 
tion 3.3. 


2.4 Protocols for Level -4 Statistics 

As defined earlier, a variable, Y — /(A), which is a 
function of one or more level -3 variables, is called a level-4 
variable. Here, guidelines are given for computing statis- 
tics of several classes of level -4 variables. It is not possible 
to specify protocols for all level -4 variables, in general, be- 
cause the procedures depend on the function /(A). 

2.4.1 Computing Statistics for Y=A+BX 

If Y is a linear function of A, then the mean of Y is 
given by the same linear function of the mean of A 


Y mie — A 4- FA m i e . (26) 
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The same is true for the median and mode of Y. The 
standard deviation of Y is scaled by the factor B 

SDy = B( SD X ). (27) 

2.4.2 Computing Statistics for Y=AX B 

The MLE method was chosen because it provides a 
robust method for estimating the mean of level -4 variables 
of this form. To use the MLE method, one must first 
estimate the mean and variance of ln(F). These statistics, 
rriy and Sy , can then be substituted into (22)-(25), in place 
of m x and s^, to estimate the mean, standard deviation, 
median, and mode of Y . 

Let Y = f(X) be a function of this form where X is 
a single level -3 variable. Its natural logarithm is a lin- 
ear function of \n(X) (8). If m x and are statistics of 
ln(X) derived from the level-3 data sets by (20) and (21), 
respectively, then the mean and variance of ln(Y) are, re- 
spectively: 


and 


= ln(.A) + Bm x 

(28) 

= B 2 s 2 x . 

(29) 


Statistics of Y = f(X) can be derived by substituting 
m y = m x and s y = s \ into (22)-(25). 


2.4.3 Statistics for Other Functions 

So far the only considerations were functions of a sin- 
gle variable X. In general, if Y is a function of two or 
more level -3 variables, knowledge of the covariances be- 
tween the level -3 variables is required to derive statistics 
for V. It was initially recommended that a covariance ma- 
trix be saved as part of the level -3 statistics, but the stor- 
age costs were considered too high. Subsequently, it was 
decided to save statistics of IC K because this function ap- 
pears frequently in primary productivity algorithms. 

Another situation involving a function of several level -2 
variables occurs when a regional bio-optical algorithm is 
applied to derive better estimates of the CZCS-like pig- 
ment concentration. For example, suppose the standard 
(global) CZCS-like pigment algorithm is 


PIG 


L\vn(Ai) 


1 B* 


LL\vn(Aj)J 


(30) 


where Lwn( Ai) and L\yn( Aj) are the normalized water- 
leaving radiance in bands i and j , and the wish is to com- 
pute pigment according to an alternative algorithm 


PIG r = 


AVN 


(A 0 


[Lwiv(Aj) J 


B r 


(31) 


using regionally-derived parameters, A r and B r . In this 
situation, it is possible to use the saved level -3 statistics 


for PIG to estimate statistics for PIG r . Substituting the 
means of Lwiv(Ai) and LwN(^j) into (31) is not recom- 
mended. 

The recommended procedure is, first, to estimate the 
mean and variance of ln(PIG) according to (20) and (21). 
These statistics can be denoted by m 9 and s y , respectively. 
The mean of ln(PIG r ) is then given by 


B / \ 

m r = ln(4 r ) + ~ ln(-<4s)J 

and the variance of ln(PIG r ) is 


si = 


B l ' 


(32) 


(33) 


These statistics can then be substituted into (22)-(25), 
replacing m r = m x , and s% = to obtain the statistics 
for PIG r . 

This flexibility is the primary reason that the MLE 
method was chosen over the more commonly used esti- 
mation methods, e.g., arithmetic averages, for estimating 
spatial and temporal means. As shown in Section 3, the 
MLE estimator for the mean proved to be equivalent to 
the arithmetic average for spatial averages of CZCS data, 
and, in most situations, for temporal averages of moored 
fluorometer data. The statistical study detailed in Sec- 
tion 3 provides empirical evidence to support the use of 
the MLE method, as well as theoretical results which ex- 
plain its success and, in some instances, failure for certain 
data sets. 


3. EMPIRICAL BASIS 

In 1992-93, a study was conducted to address statis- 
tical questions related to level "3 binning algorithms for 
SeaWiFS data. The questions addressed and recommen- 
dations derived from this study have been presented in 
Section 2 of this report. Here, the actual results of this 
study are presented. Results pertaining to spatial binning 
algorithms are presented in Section 3.1, followed by results 
pertaining to temporal binning algorithms in Section 3.2. 
Following the presentation of results, Section 3.3 contains 
a discussion of the major conclusions. Questions concern- 
ing the equivalence of the MLE and AVG methods are 
addressed in this section, and specific situations are de- 
scribed when the two methods would and would not be 
equivalent. 

3.1 Spatial Statistics 

The first step in creating level -3 data involves averaging 
data from a single orbital pass. This is considered the spa- 
tial binning step, because the data involved are regarded 
as simultaneous. 

Three questions related to spatial binning were ad- 
dressed: 


10 



J.W. Campbell, J.M. Blaisdell, and M. Darzi 


1. How should level -2 data be averaged to provide 
the best estimate of their mean? 

2. How should level-4 means be estimated? 

3. What statistics should be saved? 

These are the first three questions presented and discussed 
in Sections 2.2. 1-2. 2.3. 

3.1.1 Methods 

Full-resolution CZCS data were used to address the 
aforementioned questions. The procedure was to use the 
full-resolution data to define the true mean of each variable 
within 9x9 km 2 bins and to compare other estimates of 
the mean against the true mean. 

Seven scenes were selected as representative of the full 
range of variability in CZCS data. Details of these scenes 
are given in Table 1. The level -1 data were processed ac- 
cording to standard algorithms using the DSP ANLY2DBL 
code [Rosenstiel School of Marine and Atmospheric Sci- 
ence (RSMAS) 1990]. (The version of ANLY2DBL.EXE used 
in processing CZCS data was created 19 April 1990, and 
modified 18 September 1991). The resulting level -2 vari- 
ables involved in this study were: 

Lvvv(Ai) normalized water-leaving radiances in bands 
i = 1-3, 

CHL pigment concentration ( chlorophyll ), and 
A490 diffuse attenuation coefficient at A = 490 nm. 

The normalized water-leaving radiances are radiances cor- 
rected for variations in solar zenith angle across the scan. 
All radiances are corrected to correspond to a solar zenith 
angle of zero. Details of the algorithms used may be found 
in Gordon et al. (1988). 

The algorithm for A490 was 


the CHL23 ratio was employed in only three of the seven 
scenes. 

After the scenes were processed to standard level -2 
data, pixels in each scene were sorted into 9x9 km 2 bins 
oriented in rows perpendicular to the ground track of the 
satellite. Based on an instantaneous field-of-view (IFOV) 
angle of 0.865 x 10 -3 radians (0.496°) and a sensor altitude 
of 955 km (and ignoring tilt), the spatial resolution of pix- 
els at nadir is 0.825 km. The maximum number of pixels 
that fit into a 9 x 9 km 2 bin was 121 (11 x 11). This oc- 
curred only within ±300 pixels of nadir where pixels have 
spatial resolutions < 0.9 km. 

3. 1.1.1 Estimators of the Mean 

Only cloud- free bins containing 121 pixels were used for 
the analysis. All estimators were evaluated using both full- 
resolution (LAC) data and 4 km resolution (GAC) data. 
The latter were obtained by subsampling every fifth pixel 
on every fifth line (since 5 x 0.825 « 4 km). Thus, LAC 
estimators were based on 121 level-2 observations, whereas 
for GAC data, the number of observatons (pixels falling in 
these bins) ranged from 4-9. 

The estimators compared were: 

AV G arithmetic average (2) based on LAC data, 

AVG4 arithmetic average based on GAC data, 

MLE maximum likelihood estimator (5) based on 
LAC data, 

MLE4 maximum likelihood estimator based on GAC 
data, 

MED geometric mean or median estimator (4) 
based on LAC data, and 

MED4 geometric mean or median estimator based 
on GAC data. 


IU90 = 0.022 + 0.088 

where L\v{Xi) is the non-normalized water-leaving radi- 
ance in band i. The quantity CHL was derived using a 
bifurcated algorithm that involved two ratio formulas: 


Lw( Ai) 




L\v (A3) 


(34) 


and 


chl 13 


1.130 


Lw{ Ai) 
Lw( A 3 ) 


- 1.705 


CHL 23 


3.327 


Lw ( A2) 
L\v (A 3 ) 


-2.44 


(35) 


(36) 


According to this algorithm, CHL was equal to CHL 13 ex- 
cept when both formula values exceeded 1.5 mgm -3 , in 
which case, CHL was equal to CHL 23 . The CHL 13 ra- 
tio was employed in all of the scenes analyzed, whereas 


For each bin, the AVG estimator based on LAC data 
( n = 121) is given in (2) and was considered the true mean. 
In this equation, X, is the ith observation or realization of 
the variable X [equal to L WN {\{), L WN (X 2 ), L WN ( A 3 ), 
CHL, or A490] , and n is the number of observations (pix- 
els) falling in a bin. The true mean was computed for 
each variable and each bin having n = 121 valid observa- 
tions._The other estimators of the mean were compared 
with Xavg to determine how well they performed. 

3. 1.1.2 Standard Lepel-2 Variables 

Let X = [Tww(Ai),LvvAf(A 2 ),Lwiv(A3),CHL,i ; £r49o] re- 
fer to the vector of standard variables, and let Y = /(X) 
be any function that is derived from one or more of the 
standard variables. 

The arithmetic mean of the function based on LAC 
data (n=121) 

^avg = ~yZ Y i (37) 

i= 1 
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Table 1 . CZCS scenes used for the analysis of spatial statistics. The scenes are listed in increasing order 
of mean pigment (see Fig. 1). The number of lines listed were for the whole scene, and the number of bins 
given is the number of 9 x 9 km 3 bins containing data. Time is given in Greenwich Mean Time (GMT) in the 
(left-to-right) order of hour, minutes, and seconds. (Note: In Tables 2 and 3, the number of bins listed is the 
number of bins containing n = 121 pixels. Only these cloud-free bins were used to define true means in the 
images.) 


ID 

Orbit 

Date 

Time 

Tilt 

Location 

Lines 

Bins 

1 

1,200 

19 Jan 79 

1:56:27 

O 

O 

CM 

Northwestern Pacific 

1,023 

3,186 

2 

218 

9 Nov 78 

0:52:23 

0 

Northwestern Pacific 

1,023 

2,964 

3 

1,029 

6 Jan 79 

16:38:13 

-14 

South Atlantic 

1,023 

3,087 

4 

1,016 

5 Jan 79 

18:33:31 

6 

Eastern Tropical Pacific 

2,376 

8,266 

5 

1,452 

6 Feb 79 

7:19:17 

8 

Indian Ocean 

1,584 

7,475 

6 

971 

2 Jan 79 

12:31:21 

-2 

Northwest of Africa 

1,023 

1,040 

7 

1,386 

1 Feb 79 

12:45:19 

20 

Southwest of Africa 

1,584 

4,020 


was considered its true mean, where Y* = f(Xi ) is the 
function calculated at pixel i. This defined the AVG esti- 
mator for Y. Similarly, the AVG4, MLE and MLE4 esti- 
mators for the mean of Y were defined by substituting Y{ 
for Xi in the appropriate equations. In addition to these 
estimators, the FNC ( function ) estimator was defined as 

Ff„c = /(x avg ) (38) 

where X avg is the arithmetic average of X. This would be 
the result of calculating the function using level-3 means. 
It was called FNC when A avg was the AVG estimator, and 
FNC4 when X avg was the AVG4 estimator. 

Functions that were investigated were as follows: 

ICk integral pigment (1) within the upper optical 
depth, 

Z e 1% light depth, and 

Ya,b pigment algorithm A(Lwn{^\)/Lwn{^)) B , 
where A = 1 and B = — 1, —2, and —3. 

3.1. 1.3 Relative Errors 

For each bin, the relative error in an estimate of the 
mean, X es t 5 was defined as a percentage of the true mean 
X avg 

ERROR = Xes lT Xavg x 100% (39) 

-^avg 

where AT es t was the estimate based on the MLE, MED, 
AVG4, MLE4, or MED4 estimator. Similarly, relative er- 
rors in estimates of the mean* of a function, Y es t> were 
defined as a percentage of Y avg , where Y est was the esti- 
mate based on the MLE, FNC, AVG4, MLE4, or FNC4 
estimator. 

3,1.2 Results 

In Table 1, the scenes are listed in order of increasing 
mean pigment. In presenting results, scenes will be identi- 
fied by the number (order) found in column 1 of this table. 


3. 1.2.1 Pigment Distributions 

The pigment means and coefficients of variation (CV) 
for the seven scenes are compared in Fig. 3. Histograms of 
log(CHL) are shown in Fig. 4, where the abscissa is the 8- 
bit image value V, which is related to the logarithm (base 
10) of pigment as 

log(CHL) = -1.4 + 0.012 V. (40) 

The distributions of log(CHL) shown in Fig. 4 appear to 
be either single normal distributions, e.g., scene 1, or mix- 
tures of normal distributions, e.g., scene 3. Thus, CHL is 
approximately lognormally distributed within each scene 
or within portions of each scene. 

In scenes 4, 6, and 7, the bifurcated CHL algorithm 
resulted in a discontinuity at CHL = 1.5mgm“ 3 ( V = 
132). Values to the left of V = 132 have been calculated 
according to CHL 13 (35), whereas values to the right were 
calculated according to CHL23 (36). This is an artifact of 
the CZCS pigment algorithm, which will be avoided when 
defining the SeaWiFS CHL algorithm. In scenes 6 and 
7, CHL was recalculated using the CHL 13 algorithm for 
all pixels. The resulting CHL distributions are shown in 
Fig. 5. 

3. 1.2.2 Comparison of Estimators 

Representative results for estimators of CHL are shown 
in Figs. 6 and 7. Each point in these scatter plots corre- 
sponds to a bin in scene 4, the scene with the highest over- 
all variance. The scales are log-log. In Fig. 6, the MLE, 
MED, MLE4, and MED4 estimates are plotted against the 
AVG estimate. The patterns shown here are typical of 
those observed in all the scenes analyzed. In all scenes, 
the MLE estimator was nearly identical to the AVG esti- 
mator, whereas the MED estimator underestimated AVG. 
There was no discernible difference between the MLE4 ver- 
sus AVG and MED4 versus AVG plots. Both contained 
substantially more scatter than the plots involving MLE 
and MED estimates. 
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log(CHL) 



CV (%) 



Flg ; 3 ' T he me r Pig "\ ent ^ upper panel ) and coefficient of variation (lower panel) for the seven CZCS scenes 
rise in this analysis^ The scenes are ordered from lowest to highest mean pigment. The numbers appearing 

above each bar are the mean pigment (mg m~ 3 ) and coefficient of variation (standard deviation expressed as 
a percentage of the mean). 
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Fig. 4. Pigment histograms of seven CZCS scenes used in this analysis. The abscissa is the image value V 
which is linearly related to the logarithm of pigment: log(CHL) = —1.4 + 0.012(V). 
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Fig. 4. (cont.) Pigment histograms of seven CZCS scenes used in this analysis. 
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Fig. 4. (cont.) Pigment histograms of seven CZCS scenes used in this analysis. 


In Fig. 7, the AVG4 estimator is compared with the 
AVG and MLE4 estimators. In the AVG4 versus AVG plot, 
the scatter is strictly the result of sample size differences; 
whereas in the AVG4 versus MLE4 plot, the scatter is the 
result of differences between the estimators. It is clear from 
these comparisons that the errors associated with GAC 
estimators were predominantly the result of their reduced 
sample size. When two GAC estimators were compared, 
e.g., AVG4 versus MLE4 in Fig. 7, the two agreed as well 
as the corresponding LAC estimators. 

Color Plates 1 and 2 show level -3 mean CHL images for 
the seven scenes. That is, each pixel in these images is a bin 
in the level -3 data. Plate 1 compares the AVG and MLE 
estimators, and Plate 2 compares the AVG4 and MLE4 
estimators. Difference images are shown in Plate 3. Dif- 
ferences between the MLE and AVG estimators seemed to 
be spatially organized with the largest differences located 
along fronts and coastlines. In contrast, there were no ob- 
vious spatial patterns in the differences between MLE4 and 
AVG4 estimators. 

The combined histograms of relative errors (39) in CHL 
estimators from all seven scenes are shown in Figs. 8 and 
9, and summarized in Table 2a. In all but a few cases, 
the MLE estimator differed from the AVG estimator by 
less than 1%; whereas, the MED estimator consistently 
underestimated the mean CHL. Its bias or average error 
was —2.1%, and 95th percentile range was —11% to — 1%. 

All three GAC estimators had a tendency to under- 
estimate the true mean CHL. Errors associated with the 
AVG4 estimator are strictly the result of reducing sample 
sizes from n = 121 in the AVG estimator to n < 9 in 
the AVG4 estimator. The error histograms for AVG4 and 
MLE4 are remarkably similar. Their biases were —0.76% 
and -0.75%, respectively, and their 95th percentile range 
was —19% to +18%. The MED4 tended to underestimate 
the true mean as did the other GAC estimators, but the 


MED4 was a poorer estimator indicated by its largerf neg- 
ative bias (-2.60%). 

In the LAC error histograms (Fig. 8), true differences in 
the performance of the estimators may be seen; whereas, 
in the GAC histograms (Fig. 9), errors associated with 
reduced sample size are added to errors or differences be- 
tween estimators. Differences between GAC estimators 



MLE4 - AVG4 


DIFFl = 

AVG4 * 10 ° % 

(41) 


MED4 - AVG4 


DIFF2 = 

AVG4 * '°° % 

(42) 


were examined. Here, a distinction is made between errors 
(39) which are relative to the true mean (AVG) and differ- 
ences, (41) and (42), which are relative to AVG4, another 
estimate of the mean. 

Histograms of DIFF1 and DIFF2 are shown in Fig. 10. 
These results for GAC estimators are very similar to the 
patterns seen when comparing LAC estimators (compare 
Fig. 10 with Fig. 8). The AVG4 and MLE4 estimators 
agree, as well as the AVG and MLE estimators; differences 
between the two methods of estimating the mean CHL 
are negligible. Likewise, differences between the MED4 
and AVG4 estimators followed the same pattern as differ- 
ences between the MED and AVG estimators. In both 
cases, the geometric mean underestimated the arithmetic 
average. The large errors in AVG4, MLE4, and MED4 
(Fig. 9) were clearly dominated by the sample size ef- 
fect. 

The patterns seen in Figs. 8-10 for CHL estimators 
are similar to those that are obtained for other variables. 
GAC error histograms for the other variables (compara- 
ble to Fig. 9) are shown in Figs. 11-14, and summaries 


f In referring to biases, the terms larger and smaller refer to 
the magnitude or absolute value of the bias. 


17 



Level -3 SeaWiFS Data Products: Spatial and Temporal Binning Algorithms 




Fig. 5. CHL histograms for scenes 6 and 7 derived using CHL13 algorithm only. 


18 



J.W. Campbell, J.M. Blaisdell, and M. Darzi 



Fig. 6. In these scatter plots, four estimators of the mean are compared with the true mean (AVG) defined 
as the arithmetic average of all pixels in a bin (sample size = 121). The level-2 data used were CZCS-derived 
pigment values from scene 4 (see Table 1). Like the AVG estimator, the MLE and MED estimators are based 
on full-resolution (LAC) data, whereas the MLE4 and MED4 estimators are based on 4 km subsampled (GAC) 
data. The scales on each plot are log-log where the range is from 0.04 (V=0) to 45 (V=255), where V is the 
8-bit image value [see (40)]. 



Fig. 7. In these scatter plots, the ordinate (AVG4) is the arithmetic average based on 4 km subsampled 
(GAC) data for the same scene as in Fig. 6. The plot on the left compares this estimator with the average 
based on full-resolution (LAC) data. The scatter in this plot is strictly the result of sample size differences. 
The AVG4 has less precision since its sample size is reduced from n = 121 (LAC) to n < 9 (GAC). The plot 
on the right compares the AVG4 and MLE4 estimators. Like the MLE and AVG estimators (Fig. 6) the 
MLE4 and AVG4 are practically identical. 
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Fig. 8. Histograms of CHL estimation errors based on 21,290 bins analyzed and full-resolution (LAC) data. 
For each bin, the error is defined as the difference between the estimator and the arithmetic average (AVG) 
of all data in the bin expressed as a percentage of AVG. The top histogram shows the error calculated as 
(MLE — AVG)/ AVG) (%). The bottom histogram shows the error calculated as (MED — AVG)/ AVG (%). 
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Fi g . 9. Histograms of CHL estimation errors based on 21,290 bins analyzed and 4 km subsampled (GAC) 
data. The top histogram shows the error calculated by (AVG4 - AVG)/ AVG (%). The middle histogram 

MMED?- AvctMVG d (%V ML “ “ AVG>/ AVG (%) ' The b ° l,0m his, ° 8ram shows the error cakulated 
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Fig. 10. Histograms of DIFF1 and DIFF2 based on 21,290 bins analyzed. The top histogram was calculated 
with (MLE4 — AVG4)/ AVG4 (%). The bottom histogram was calculated with (MED4 — AVG4)/ AVG4 (%). 
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Fig. 11. Histograms of estimation errors based on 20,373 bins analyzed and 4 km subsampled (GAC) 
data. The top histogram was calculated using (AVG4 - AVG)/ AVG (%). The bottom histogram was calcu- 
lated for (MLE4 - AVG)/ AVG (%). 
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Fig. 11. (cont.) Histogram of A 490 estimation errors based on 20,373 bins analyzed and 4 km subsampled 
(GAC) data was calculated using (MED4 — AVG)/ AVG (%). 



Fig. 12. Histogram of Lwtv(443) estimation errors based on 21,290 bins analyzed and 4km subsampled 
(GAC) data was calculated for (AVG4 - AVG) / AVG (%). 
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Fig. 12. (cont.) Histograms of L WN { 443) estimation errors based on 21,290 bins analyzed and 4 km subsam- 
pled (GAC) data. The top histogram was calculated for (MLE4 — AVG)/ AVG (%). The bottom histogram 
was calculated for (MED4 - AVG)/ AVG (%). 
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Fig. 13. Histograms of L WN (5 20) estimation errors based on 21,290 bins analyzed and 4km subsampled 
(GAC) data. The top histogram was calculated for (AVG4 — AVG)/ AVG (%). The bottom histogram was 
calculated for (MLE4 - AVG)/ AVG (%). 
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Fig. 13. (cont.) Histogram of Lww(520) estimation errors based on 21,290 bins analyzed and 4 km subsam- 
pled (GAC) data was calculated for (MED4 - AVG)/ AVG (%). 



Fig. 14. Histogram of L WN { 550) estimation errors based on 21,290 bins analyzed and 4 km subsampled 
(GAC) data was calculated for (AVG4- AVG)/ AVG (%). 
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Fig. 14. (cont.) Histograms of Lw;v(550) estimation errors based on 21,290 bins analyzed and 4 km subsam- 
pled (GAC) data. The top histogram was calculated for (MLE4 - AVG)/ AVG (%). The bottom histogram 
was calculated for (MED4 - AVG)/ AVG (%). 
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of relative errors are listed in Tables 2a-e for CHL, /C490 , 
Lwtv(Ai), Lwn(^ 2), and Lwn(^ 3 ), respectively. The last 
two columns on the right in this table give the 95th per- 
centile range for the relative errors. All GAC estimators 
had negative biases. AVG4 and MLE4 were nearly iden- 
tical with average errors on the order of -1%; whereas, 
MED4 had average errors of approximately -2%. The 
CHL variable had the highest overall errors, with a 95th 
percentile range generally around ±20%; whereas, the oth- 
er variables had errors that were generally within 
± 10 %. 

The three scenes that used both CHL13 and CHL23 had 
substantially higher errors than those of the other scenes. 
These scenes also had the highest variance in CHL and 
other variables. Although higher estimation errors would 
be expected when sampling from distributions with higher 
variance, there was the need to determine whether the 
CHL errors were anomalously large due to the bifurcated 
CHL algorithm. The higher variance in CHL might have 
been an artifact resulting from the discontinuous nature of 
the pigment distribution. 

To determine whether this was true, the analysis was 
repeated for scenes 6 and 7 using CHL13 to derive CHL 
for all pixels (Fig. 5). The results were essentially the 
same. These images still had large intrabin variances in 
CHL (CHL 13), and their error distributions (not shown) 
were essentially unchanged. 

Statistics pertaining to the IC K function estimators are 
presented in Table 3, and error histograms for the MLE 
and FNC estimators are shown in Fig. 15 and for the 
AVG4, MLE4, and FNC4 estimators in Fig. 16. In con- 
trast to the MED estimator, the FNC estimator tended to 
overestimate the true mean. In the case of the FNC4 esti- 
mator, this tendency (positive bias) was apparently offset 
by the underestimation tendency (negative bias) associ- 
ated with small sample sizes. The result was that the bias 
of the FNC4 estimator was nearly zero. 

In Section 2.4, a protocol was presented for estimating 
the mean of level -4 variables of the form Y = AX B based 
on saved statistics of the level -2 variable X . The accuracy 
of the prescribed protocol depends strictly on whether the 
MLE estimator is a good approximation to the AVG or 
true mean of these functions. 

Results for the Z e and Y AiB functions (not shown) es- 
tablished that the MLE estimator was essentially identical 
to the AVG estimator. Root-mean-square (rms) errors for 
the MLE4 and AVG4 estimators were within ±5% for Z e . 
Errors for Y a ,b increased as B changed from —1 to —3, 
with the highest rms errors being associated with scenes 
6 and 7. MLE4 and AVG4 errors were within ±5% for 
B = -1, within ±15% for B = -2, and ±30% for B = -3. 
These ranges are consistent with the results for the CHL 
algorithm where B = -1.7 (CHL13) and -2.4 (CHL23). 
As in the case of the IC K function, the FNC4 estimator 
was not significantly different from the MLE4 and AVG4 
estimators. In all three cases, errors were dominated by 
the effects of reduced sample size. 


3.2 Temporal Statistics 

After the spatial statistics are derived from data on a 
single orbital pass, these statistics will be averaged over 
time to produce temporal statistics. No further reduction 
in spatial resolution takes place, but after being averaged 
over time, temporal statistics will have reduced temporal 
resolution. 

Statistical questions regarding the use of weighted ver- 
sus unweighted statistics have been discussed above. These 
questions were not addressed in this study. This phase of 
the study focused on questions concerning the performance 
of the estimators studied in the earlier (spatial statistics) 
phase of the study. Specifically, the questions addressed 
were: 

1. Would the MLE estimator continue to be equiv- 
alent to the AVG estimator as variance increases 
due to temporal variability? 

2. Would the MED and FNC estimators diverge 
further from the AVG? 

In other words, the goal of this phase of the study was 
to determine whether the results obtained for spatial statis- 
tics would also pertain to temporal statistics. The greatest 
differences between the MLE and AVG estimators occurred 
in bins having the highest variance. Since temporal statis- 
tics, in general, will have increased variance due to tem- 
poral variability within bins, it was not known whether 
the MLE and AVG estimators would remain equivalent. 
Furthermore, it was predicted that the small but system- 
atic errors in the MED and FNC estimators would increase 
with increases in variance. 

3.2.1 Methods 

Ideally, several time series of CZCS images from differ- 
ent geographic regions should be analyzed to address these 
questions. However, this approach was not considered fea- 
sible. Since CZCS was operated only 10% of the time, its 
sampling frequency for any bin was much lower than that 
expected for SeaWiFS, which will operate continuously. 

To investigate how phytoplankton pigment distribu- 
tions vary over time at a fixed location, and to answer 
the above questions, the Shelf Edge Exchange Program II 
(SEEP II) moored fluorometer data (Medeiros and Wirick 
1992) was analyzed. These data consisted of temporal 
records of chlorophyll fluorescence from six moored fluo- 
rometer arrays located along the outer edge of the conti- 
nental shelf off the Delmarva Peninsula. The mooring ar- 
rays were deployed between February 1988 and May 1989. 
Details of the SEEP II data are given in Table 4. 

At each mooring, a time series of daily satellite-derived 
surface chlorophyll measurements was simulated by select- 
ing the SEEP measurement closest to 10 AM from the shal- 
lowest fluorometer. The depths of these instruments ranged 
from 16-39 m (see Table 4). 
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Table 2a. Summary of relative errors for CHL estimators. 


Estimator 

Scene 

Number 

Bias 

Error 

95% Range 

Used 

Number 

of Bins 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

2,750 

0.00 

0.00 

-1 

0 


2 

1,850 

0.00 

0.00 

-i 

0 


3 

2,584 

-0.03 

1.26 

-1 

0 


4 

5,773 

-0.03 

0.42 

-1 

0 


5 

4,535 

-0.05 

1.42 

-i 

0 


6 

513 

0.07 

1.19 

-2 

2 


7 

3,285 

-0.21 

1.13 

-3 

1 

Combined 

21,290 

-0.05 

0.95 

-i 

0 

MED 

i 

2,750 

-1.14 

1.20 

-3 

-1 


2 

1,850 

-1.16 

1.25 

-3 

-1 


3 

2,584 

-1.08 

2.15 

-6 

0 


4 

5,773 

-1.98 

2.95 

-8 

-1 


5 

4,535 

-1.37 

2.27 

-3 

-1 


6 

513 

-4.67 

7.32 

-24 

0 


7 

3,285 

-5.13 

7.38 

-21 

-1 

Combined 

21,290 

-2.11 

3.74 

-11 

-1 

AVG4 

1 

2,750 

1.12 

7.28 

-13 

15 


2 

1,850 

0.68 

7.07 

-13 

15 


3 

2,584 

-2.00 

6.07 

-14 

8 


4 

5,773 

-3.12 

9.35 

-21 

13 


5 

4,535 

0.00 

7.31 

-14 

14 


6 

513 

0.71 

11.79 

-22 

25 


7 

3,285 

0.71 

16.72 

-26 

42 

Combined 

21,290 

-0.76 

9.86 

-19 

18 

MLE4 

1 

2,750 

1.12 

7.29 

-13 

15 


2 

1,850 

0.69 

7.06 

-13 

15 


3 

2,584 

-1.99 

6.06 

-14 

8 


4 

5,773 

-3.10 

9.26 

-21 

13 


5 

4,535 

-0.01 

7.25 

-14 

14 


6 

513 

. 0.88 

11.86 

-22 

25 


7 

3,285 

0.69 

16.51 

-26 

41 

Combined 

21,290 

-0.75 

9.77 

-19 

18 

MED4 

1 

2,750 

0.17 

7.14 

-14 

14 


2 

1,850 

-0.26 

6.95 

-14 

14 


3 

2,584 

-2.96 

6.50 

-16 

7 


4 

5,773 

-4.81 

9.95 

-24 

11 


5 

4,535 

-1.04 

7.21 

-15 

13 


6 

513 

-3.50 

12.52 

-30 

18 


7 

3,285 

-4.07 

14.02 

-32 

24 

Combined 

21,290 

-2.60 

9.38 

-22 

14 
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Table 2b, Summary of relative errors for K 490 estimator 


Estimator 

Scene 

Number 

Bias 

Used 

Number 

of Bins 

[%] 

MLE 

1 

2,757 

0.00 


2 

1,875 

0.00 


3 

2,584 

0.00 


4 

5,773 

0.00 


5 

4,562 

0.00 


6 

424 

0.00 


7 

2,398 

0.00 

Combined 

20,373 

0.00 

MED 

1 

2,757 

0.00 


2 

1,875 

0.00 


3 

2,584 

-0.05 


4 

5,773 

-0.16 


5 

4,562 

-0.12 


6 

424 

-1.09 


7 

2,398 

-0.93 

Combined 

20,373 

-0.21 

AVG4 

1 

2,757 

0.26 


2 

1,875 

0.20 


3 

2,584 

-0.69 


4 

5,773 

-1.02 


5 

4,562 

0.01 


6 

424 

0.37 


7 

2,398 

0.06 

Combined 

20,373 

-0.31 

MLE4 

1 

2,757 

0.26 


2 

1,875 

0.20 


3 

2,584 

-0.68 


4 

. 5,773 

-1.03 


5 

4,562 

0.01 


6 

424 

0.37 


7 

2,398 

0.05 

Combined 

20,373 

-0.31 

MED4 

1 

2,757 

0.21 


2 

1,875 

0.14 


3 

2,584 

-0.80 


4 

5,773 

-1.27 


5 

4,562 

-0.24 


6 

424 

-0.75 


7 

2,398 

-0.81 

Combined 

20,373 

-0.59 
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Table 2c. Summary of relative errors for Xwjv(Ai) estimators. 


Estimator 

Used 

MLE 


Scene 

Number 

1 

2 

3 

4 

5 

6 
7 

Combined 



AVG4 


MED4 


Combined 


1 

2 

3 

4 

5 

6 
7 


Combined 


1 
2 

3 

4 

5 

6 
7 

Combined 

1 

2 

3 

4 

5 

6 
7 

Combined 


Number 
of Bins 

2,750 

1,850 

2,584 

5,773 

4,535 

513 

3,285 

21,290 


21,290 


2,750 

1,850 

2,584 

5,773 

4,535 

513 

3,285 


21,290 


2,750 

1,850 

2,584 

5,773 

4,535 

513 

3,285 

21,290 

2,750 

1,850 

2,584 

5,773 

4,535 

513 

3,285 

21,290 


-0.70 


0.35 

0.46 

-0.76 

-1.58 

0.73 

0.08 

-4.63 


-0.99 


0.35 

0.46 

-0.76 

-1.58 

0.73 

0.09 

-3.95 

^ 0.89 

0.30 

0.39 

- 0.88 

-1.85 

0.52 

-0.50 

-7.94 

- 1.66 


Error 
(rms) [%] 

0.00 

0.00 

0.00 

0.31 

0.32 

0.04 

3.39 

1.35 


95% Range 

Minimum Maximum 


2,750 

0.00 

0.00 

1,850 

0.00 

0.02 

2,584 

-0.03 

0.19 

5,773 

-0.20 

0.74 

4,535 

-0.08 

0.41 

513 

-0.60 

1.13 

3,285 

-3.97 

9.10 


-1 

0 

-1 

0 

-1 

0 

-2 

0 

-2 

0 

-4 

0 

-31 

0 

-6 

0 

-3 

3 

-4 

3 

-5 

2 

-9 

4 

-6 

6 

-9 

8 

-36 

11 



-3 

3 

-4 

3 

-5 

2 

-9 

4 

-6 

6 

-9 

8 

33 

15 

-11 

5 

-3 

3 

-4 

3 

-5 

2 

•10 

3 

-6 

6 

■10 

7 

•54 

6 

-16 

5 
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Table 2d. Summary of relative errors for L\vn{ A2) estimators. 


Estimator 

Used 

Scene 

Number 

Number 
of Bins 

Bias 

[%] 

Error 

(rms) [%] 

95% Range 

Minimum Maximum 

MLE 

1 

2,750 

0.01 

0.12 

-1 

0 I 


2 

1,850 

0.04 

0.29 

-1 

0 


3 

2,584 

0.00 

0.03 

-1 

0 


4 

5,773 

0.03 

0.22 

-1 

0 


5 

4,535 

0.02 

0.20 

-1 

0 


6 

513 

0.03 

0.22 

-1 

0 


7 

3,285 

0.03 

0.36 

-1 

0 

Combined 

21,290 

0.02 

0.23 

-1 

0 

MED 

1 

2,750 

-0.04 

0.24 

-1 

0 


2 

1,850 

-0.15 

0.55 

-2 

0 


3 

2,584 

-0.09 

0.33 

-2 

0 


4 

5,773 

-0.27 

0.73 

-3 

0 


5 

4,535 



-2 

0 


6 

513 

-1.26 

2.45 

-9 

0 


7 

3,285 


2.23 

-7 

0 

Combined 

21,290 

-0.30 

1.07 

-3 

0 

AVG4 

1 

2,750 

0.43 

3.09 

-6 

6 


2 

1,850 

0.49 

3.61 

-7 

7 


3 

2,584 

-1.29 

2.80 

-7 

3 


4 

5,773 

-2.30 

4.55 

-11 

5 


5 

4,535 

0.57 

3.24 

-6 

6 


6 

513 

0.37 

5.41 

-13 

11 


7 

3,285 

-2.50 

6.08 

-14 

7 

Combined 

21,290 

-0.94 

4.19 

-10 

6 

MLE4 

1 


0.44 


-6 

6 


2 


0.49 


-7 

7 


3 


-1.29 


-7 

3 


4 


-2.27 


-11 

5 


5 


0.59 


-6 

6 


6 


0.41 


-13 

11 


7 


-2.48 


-14 

7 

Combined 

21,290 

-0.92 

4.16 

-10 

6 

MED4 

1 

2,750 

0.25 


-7 

6 


2 

1,850 

0.22 


-8 

7 


3 

2,584 

-1.48 


-7 

3 


4 

5,773 

-2.68 


-12 

4 


5 

4,535 

0.34 


-7 

6 


6 

513 

-0.78 


-16 

9 


7 

3,285 

-3.38 


-17 

5 

Combined 

21,290 

-1.32 

4.53 

-11 

6 


3 
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Table 2e. Summary of relative errors for Lwn(^z) estimators. 


Estimator 

Scene 

Number 

Bias 

Error 

95% Range 

Used 

Number 

of Bins 

[%i 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

2,750 

0.01 

0.09 

-1 

0 


2 

1,850 

0.03 

0.22 

-1 

0 


3 

2,584 

0.00 

0.03 

-1 

0 


4 

5,773 

0.02 

0.19 

-1 

0 


5 

4,535 

0.00 

0.07 

-1 

0 


6 

513 

0.06 

0.35 

-1 

1 


7 

3,285 

0.01 

0.25 

-i 

0 

Combined 

21,290 

0.01 

0.17 

-i 

0 

MED 

1 

2,750 

-0.60 

0.79 

-2 

0 


2 

1,850 

-0.77 

0.95 

-2 

0 


3 

2,584 

-0.19 

0.50 

-2 

0 


4 

5,773 

-0.67 

1.02 

-3 

0 


5 

4,535 

-0.13 

0.38 

-2 

0 


6 

513 

-2.09 

3.50 

-11 

0 


7 

3,285 

-1.02 

2.01 

-6 

0 

Combined 

21,290 

-0.59 

1.19 

-3 

0 

AVG4 

1 

2,750 

1.07 

4.75 

-9 

10 


2 

1,850 

0.94 

4.88 

-9 

10 


3 

2,584 

-1.92 

3.75 

-9 

4 


4 

5,773 

-3.43 

6.18 

-15 

6 


5 

4,535 

0.77 

3.91 

-8 

8 


6 

513 

0.57 

6.90 

-15 

14 


7 

3,285 

-3.16 

6.37 

-15 

7 

Combined 

21,290 

-1.25 

5.26 

-13 

8 

MLE4 

1 

2,750 

1.07 


-9 

10 


2 

1,850 

0.95 


-9 

10 


3 

2,584 

-1.92 


-9 

4 


4 

5,773 

-3.40 


-15 

6 


5 

4,535 

0.77 


-8 

8 


6 

513 

0.65 


-14 

14 


7 

3,285 

-3.14 


-15 

7 

Combined 

21,290 

-1.24 

5.24 

-13 

8 

MED4 

i 


0.67 


-9 

9 


2 


0.49 


-10 

9 


3 


-2.24 


-10 

3 


4 


-4.07 


-16 

5 


5 


0.47 


-8 

8 


6 


-1.26 


-20 

11 


7 


-4.08 


-17 

6 

Combined 

21,290 

-1.81 

5.63 

-14 

8 
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Table 3. Summary of relative errors for ICk estimators. 


Estimator 

Used 

MLE 


Image 

Number 



AVG4 


Combined 

1 

2 

3 

4 

5 

6 
7 


Combined 


1 

2 

3 

4 

5 

6 
7 


Combined 



Combined 


Number 
of Bins 

2,750 

1,850 

2,584 

5,752 

4,533 

423 

2,379 

20,271 

2,750 

1,850 

2,584 

5,752 

4,533 

423 

2,379 


20,271 


2,750 

1,850 

2,584 

5,752 

4,533 

423 

2,379 


20,271 


2,750 

1,850 

2,584 

5,752 

4,533 

423 

2,379 

0,271 

2,750 

1,850 

2,584 

5,752 

4,533 

423 

2,379 

0,271 


0.00 

0.00 

- 0.01 

- 0.01 

- 0.02 

- 0.01 

-0.04 

- 0.01 

0.14 

0.23 

0.39 

0.86 

0.80 

2.57 

1.84 


0.78 


0.87 

0.49 

-1.34 

-2.19 

0.07 

0.61 

-0.30 


-0.64 


0.87 

0.49 

-1.34 

-2.18 

0.07 

0.62 

-0.33 

-0.64 

1.19 

0.82 

-0.91 

-1.45 

0.59 

2.86 

1.28 

0.05 
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0.00 
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7.96 
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Fig. 15. Histograms of CHL/AT490 estimation errors based on 20,271 bins analyzed and full resolution (LAC) 
data. For each bin, the error is defined as the difference between the estimator and the arithmetic average 
(AVG) of all data in the bin expressed as a percentage of AVG. The FNC estimator is the AVG estimator of 
CHL divided by the AVG estimator of ^490- The top histogram was calculated for (MLE — AVG)/ AVG (%). 
The bottom histogram was calculated for (FNC - AVG)/ AVG (%). 
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Fig. 16. Histograms of CHL/ ^490 estimation errors based on 20,271 bins analyzed and 4 km subsampled 
(GAC) data. The top histogram was calculated for (AVG4 - AVG)/ AVG (%). The bottom histogram was 
calculated for (FNC4 - AVG)/ AVG (%). 
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Table 4* Location and depth of SEEP II moored fluorometers and the period covered by the time series data 
used in the analysis of temporal statistics. 


ID 

Deployment 

Latitude 
[deg. min.] 

Longitude 
[deg. min.] 

Depth 

[m] 

Time Series 

Start 

Finish 

1 

Spring 

37 52.60 

74 43.90 

39 

7 Feb 88 

8 Apr 88 


Summer 

37 52.49 

74 43.90 

18 

25 Jun 88 

19 Oct 88 


Winter 

37 47.62 

74 44.60 

19 

12 Nov 88 

17 Mar 89 

2 

Spring 

37 46.11 

74 29.50 

16 

8 Feb 88 

9 Apr 88 


Winter 

37 34.69 

74 35.13 

24 

15 Nov 88 

28 Jan 89 

3 

Spring 

37 41.99 

74 20.35 

19 

8 Feb 88 

9 Jun 88 


Summer 

37 41.98 

74 20.37 

21 

25 Jun 88 

17 Oct 88 


Winter 

37 41.96 

74 20.27 

19 

11 Nov 88 

8 May 89 

5 

Spring 

37 39.80 

74 15.85 

21 

8 Feb 88 

7 Jun 88 


Summer 

37 39.78 

74 15.72 

22 

26 Jun 88 

17 Oct 88 


Winter 

37 39.73 

74 15.78 

21 

15 Nov 88 

2 May 89 

6 

Spring 

37 37.91 

74 12.86 

20 

12 Feb 88 

7 Jun 88 


Summer 

37 37.90 

74 12.87 

20 

25 Jun 88 

19 Oct 88 


Winter 

37 37.95 

74 12.77 

35 

13 Nov 88 

6 May 89 

8 

Spring 

36 52.63 

74 39.04 

22 

13 Feb 88 

8 Jun 88 


AT490 was derived from the chlorophyll measurement by 
the formula 

K 4 90 = 0.022 + 0.079 CHL 0875 . (43) 

This is the relationship between .K490 (34) and CHL13 (35). 
In the CZCS imagery analyzed, this relationship would 
hold for most of the data since CHL equals CHL 13 in most 
pixels. 

Weekly and monthly means of CHL and AT490 were de- 
rived using the AVG, MLE, and MED estimators. When 
sample sizes are small (e.g., n < 7), the effect of sample 
size dominates the error statistics. To control for this ef- 
fect in weekly means, only weeks having 7 days, i.e., no 
missing data, were analyzed. However, because there were 
fewer months, all months were analyzed, regardless of their 
sample size. The AVG estimator was regarded as the true 
mean. Errors for the MLE and MED estimators were ex- 
pressed as a percentage of the AVG estimator. 

Weekly and monthly means of the function IC K (1) 
were also derived. Estimators compared with the AVG or 
true mean were the MLE and MED estimators, and an 
FNC estimator defined in two ways: 

FNC(AVG) 

and 

FNC(MLE) 

The FNC(MLE) estimator would be applicable if, as rec- 
ommended, spatial statistics are derived according to the 
MLE estimator. 

To investigate the behavior of the AVG, MLE, MED, 
and FNC estimators as samples sizes increase over time, 
cumulative means were obtained as follows: 


AVG estimator of CHL 
AVG estimator of K 490 

MLE estimator of CHL 
MLE estimator of AT 49 0 * 


(44) 

(45) . 


AVG(n) arithmetic average of all data from days 1 
to n, 

MLE(n) MLE estimate based on data from days 1 
to n, 

MED(n) MED estimate based on data from days 1 
to n, and 

FNC(n) FNC estimate based on data from days 1 
to n. 

The cumulative means began day 1 at the start of each 
deployment. Since each mooring had up to three separate 
deployments (see Table 4), there were 1-3 sets of cumula- 
tive means for each mooring. These were plotted against 
n to observe how the estimators behaved as a function of 
sample size. 

In a similar manner, the behavior of the estimators 
as functions of area were investigated using CZCS data. 
Beginning at one or two selected locations in each CZCS 
scene (Table 1), the estimators were calculated for bins of 
increasing area LxL, with L increasing from 9 km to the 
size of the image. The maximum value of L was 480 km. 
Increases in area may be regarded as analogous to increases 
in time. To the extent that this is true, these results would 
pertain to the estimation of temporal means. 

3.2.2 Results 

Histograms of log(CHL) from each mooring are shown 
in Fig. 17. Based on the normal (Gaussian) appearance 
of these histograms, the distribution of chlorophyll over 
time at a single location is approximately lognormal, or a 
mixture of lognormals. 
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Fig. 17. Histograms of CHL from SEEP II moored fluorometer data. Data are from the shallowest fluorometer 
at each mooring. All data from moorings 1, 2, 3, 5, 6, and 8 are included in these histograms. 
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Histograms of relative errors in weekly and monthly 
means are shown in Figs. 18-25 and summarized in Table 5. 
The upper panel in each figure is the error histogram for 
weekly means, and the lower panel is for monthly means. 
The patterns seen in Fig. 18 for the MLE estimator of CHL 
were similar to those obtained for the MLE estimators of 
the other variables. The MLE and AVG estimates agreed 
within ±5% most of the time, with a slight tendency for 
MLE to exceed AVG, as indicated by the small positive 
biases (usually much less than 1%) in all cases. As in the 
case of the CZCS data, the #490 and IC K variables had 
much smaller MLE errors than the CHL variable. 

The MED estimator had relatively large negative er- 
rors for all three variables. That is, the MED estimator 
underestimated the arithmetic average by 40% or more in 
some cases, and monthly mean errors were about a factor 
of 2 greater than weekly mean errors. 

Errors that are associated with the FNC(AVG) and 
FNC(MLE) estimators are shown in Figs. 24 and 25, re- 
spectively, and are summarized in Table 5d. Both distri- 
butions are positively skewed, with errors as high as 30% 
or more. The FNC(MLE) estimator had larger errors than 
the FNC(AVG) estimator. 

Results for cumulative means provided important in- 
sight concerning the behavior of the estimators, in partic- 
ular when the MLE and AVG estimators were substantially 
different. These insights will be illustrated here with re- 
sults from moorings 3 and 6. Figure 26 shows cumulative 
mean CHL estimates from the spring deployments of moor- 
ing 3 (upper panel) and mooring 6 (lower panel). In the 
case of mooring 3, MLE(n) and AVG(n) remained approx- 
imately equal over the entire averaging period, whereas 
MED(n) was always less than the other two and gradually 
diverged as the averaging period increased. These results 
are typical of what was obtained for the majority of the 
cases. 

The lower panel in Fig. 26 illustrates a case where 
MLE(n) and AVG(n) diverged. The two cumulative means 
showed an abrupt divergence at about day 70; prior to that 
day, they had been nearly equal. Inspection of the data 
(Fig. 27, upper panel) revealed that there were a number 
of anomalously low values beginning after day 60. The 
dark squares in Fig. 27 were data that were missing from 
the original records. These had been set to zero and were 
ignored when calculating cumulative means. However, the 
open squares lying near the horizontal axis were small posi- 
tive values (e.g., 0.01, 0.02, etc.) which may have also been 
bad data. If these are eliminated from the record, then 
MLE(n) and AVG(n) agree (bottom panel of Fig. 27). 

This suggests that the MLE estimator can be sensi- 
tive to outliers, particularly outliers that are close to zero. 
When a data value approximately equal to zero is included 
in the arithmetic average of n values, the effect is to re- 
duce the AVG estimator by a factor of (n — l)/n. However, 
the logarithm of a number approximately equal to zero is 
a large negative number, and its effect on the statistics of 


the logarithm can be extreme. Including this value will re- 
duce the mean of the logarithm but increase the variance, 
somewhat offseting effects on the MLE estimator. In gen- 
eral, however, the net effect will be to increase the MLE 
estimator since the variance of the logarithm is increased 
substantially by the inclusion of a large negative value. 

Another case in which MLE(n) and AVG(n) diverged 
was the summer deployment of mooring 3. The simu- 
lated satellite data from this record are shown in the upper 
panel of Fig. 28, and the cumulative means in the lower 
panel. Like the previous example, there were a number 
of low values in the record. However, it is not obvious 
that these are bad data, and so there is no justification 
for removing them to make MLE(n) and AVG(n) agree. 
MLE(n) was approximately 10% higher than AVG(n) for 
n > 35 days. The cumulative means of A490 and IC K 
for this mooring are shown in Fig. 29. Differences be- 
tween MLE(n) and AVG(n) for these variables were much 
smaller than those for CHL. However, the two FNC esti- 
mates were consistently higher than AVG(n) and MLE(n), 
with differences approaching 30% by the end of the rec- 
ord. 

Cumulative means starting at two locations in CZCS 
scene 4 are illustrated in Fig. 30 (LAC means) and Fig. 31 
(GAC means). In these figures, the cumulative mean CHL 
within areas of size L 2 is plotted against L. In the north- 
ern portion of scene 4 (off the west coast of Mexico), the 
MLE and AVG cumulative means diverged at length scales 
larger than 50 km. However, in the southern region of this 
scene, the MLE and AVG means remained nearly equal for 
areas up to 460 x 460 km 2 . Results for all the CZCS scenes 
are summarized in Table 6. Whenever the MLE and AVG 
estimators diverged for CZCS cumulative means, the AVG 
estimator was greater than the MLE estimator. This oc- 
curred in the scenes that had high chlorophyll levels and/or 
high variances. In contrast, when the MLE and AVG es- 
timators in SEEP data diverged, the MLE estimator was 
usually greater than the AVG estimator. 

3.3 Discussion 

From the study of CZCS and SEEP II data, it was 
concluded that the AVG and MLE estimators are equiva- 
lent with respect to their accuracy as estimators of means 
within sampling domains. The MED and FNC estima- 
tors are not considered acceptable as estimators of the 
mean. The MED estimator systematically underestimated 
the mean, and the magnitude of its error increased with 
increasing intrabin variance. The FNC estimator, i.e., the 
result of substituting a mean into a function to derive a 
level-4 variable, also had systematic errors that increased 
with increasing variance. 

In the case of satellite data from the same scene (spa- 
tial statistics), the MLE estimator proved to be nearly 
identical to the AVG estimator when sample sizes were 
large (n = 121). The same was true for the MLE4 and 
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Fig. 18. Histograms of the relative error in MLE estimates of mean CHL at SEEP moorings. The top 
histogram is for the weekly means (n = 213), calculated with 100% x (MLE - AVG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MLE - AVG)/ AVG. 
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Fig. 19. Histograms of the relative error in MED estimates of mean CHL at SEEP moorings. The top 
histogram is for the weekly means (n = 213), calculated with 100% x (MED - AVG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MED- AVG)/ AVG. 
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Fig. 20. Histograms of the relative error in MLE estimates of mean #490 at SEEP moorings. The top 
histogram is for the weekly means (n = 213), calculated with 100% x (MLE- PNG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MLE- AVG)/ AVG. 




Level -3 SeaWiFS Data Products: Spatial and Temporal Binning Algorithms 




44 


Fig. 21. Histograms of the relative error in MED estimates of mean K 4 90 at SEEP moorings. The top 
histogram is for the weekly means (n = 213), calculated with 100% x (MED- AVG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MED - AVG)/ AVG. 
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Fig. 22. Histograms of the relative error in MLE estimates of mean CHL/K 490 at SEEP moorings. The 
top histogram is for the weekly means (n = 213), calculated with 100% x (MLE - AVG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MLE - AVG) / AVG. 
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Fig. 23. Histograms of the relative error in MED estimates of mean CHL/K490 at SEEP moorings. The 
top histogram is for the weekly means (n = 213), calculated with 100% x (MED - AVG)/ AVG. The bottom 
panel is for the monthly means (n = 74), also calculated with 100% x (MED - AVG)/ AVG. 
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Fig. 24. Histograms of the relative error in FNC(AVG) estimates of mean CHL/K 490 at SEEP moorings. 
The top histogram is for the weekly means (n = 213), calculated with 100% x (FNC - AVG)/ AVG. The 
bottom panel is for the monthly means (n = 74), also calculated with 100% x (FNC - AVG)/ AVG. 
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Fig. 25. Histograms of the relative error in FNC(MLE) estimates of mean CHL/K 490 at SEEP moorings. 
The top histogram is for the weekly means (n = 213), calculated with 100% x (FNC - AVG)/ AVG. The 
bottom panel is for the monthly means (n = 74), also calculated with 100% x (FNC - AVG) / AVG. 
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Table 5a. Summary of relative errors for weekly means derived from SEEP mooring data: results for CHL. 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 

Used 

Reference 

of Weeks 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

i 

39 

-0.33 

2.64 

-13 

4 


2 

18 

0.02 

0.06 

0 

0 


3 

53 

0.18 

1.28 

-2 

9 


5 

53 

0.24 

1.21 

-2 

6 


6 

37 

0.47 

2.27 

-3 

12 


8 

13 

0.26 

2.33 

-3 

7 

Combined 

213 

0.14 

1.79 

-13 

12 

MED 

1 

39 

-5.32 

10.83 

-41 

0 


2 

18 

-1.17 

1.44 

-3 

0 


3 

53 

-4.01 

7.26 

-27 

0 


5 

53 

-6.07 

9.06 

-42 

0 


6 

37 

-7.15 

11.07 

-34 

0 


8 

13 

-10.51 

12.81 

-22 

-1 

Combined 

213 

—5.46 

9.24 

-42 

0 


Table 5a. (cont.) Summary of relative errors for monthly means derived from SEEP mooring data: results 
for CHL. 


Estimator 

Mooring 

Number 

Bias 

Error 

.Range 

Used 

Reference 

of Months 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

13 

-1.69 

4.90 

-1 

16 


2 

6 

0.06 

0.18 

0 

0 


3 

17 

0.91 

4.09 

-3 

14 


5 

16 

0.98 

2.53 

-2 

8 


6 

17 

2.29 

6.02 

-12 

12 


8 

5 

1.75 

2.88 

0 

5 

Combined 

74 

1.37 

4.18 

-12 

16 

MED 

1 

13 

-7.59 

13.75 

-42 

-1 


2 

6 

-2.27 

2.70 

-5 

0 


3 

17 

-10.07 

15.02 

-37 

-1 


5 

16 

-9.24 

11.10 

-28 

-1 


6 

17 

-17.55 

21.92 

-39 

-1 


8 

5 

-11.92 

15.18 

-24 

-3 

Combined 

74 

-10.66 

15.24 

-42 

0 
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Table 5b. Summary of relative errors for weekly means derived from SEEP mooring data: results for K 499 . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 

Used 

Reference 

of Weeks 

[%} 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

39 

-0.18 

0.89 

-4 

1 


2 

18 

-0.03 

0.15 

0 

0 


3 

53 

0.03 

0.29 

-1 

1 


5 

53 

-0.01 

0.41 

-1 

1 


6 

37 

-0.11 

0.57 

-1 

1 


8 

13 

0.06 

0.70 

-1 

2 

Combined 

213 

-0.04 

0.54 

-4 

2 

MED 

1 

39 

-2.52 

4.72 

-17 

0 


2 

18 

-0.72 

0.90 

-2 

0 


3 

53 

-1.65 

3.05 

-15 

0 


5 

53 

-2.62 

3.81 

-17 

0 


6 

37 

-2.92 

4.95 

-18 

1 


8 

13 

-4.69 

5.75 

-10 

0 

Combined 

213 

—2.38 

4.02 

-18 

1 


Table 5b. (cont.) Summary of relative errors for monthly means derived from SEEP mooring data: results 
for K 490 . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 

Used 

Reference 

of Months 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

13 

0.27 

0.75 

0 

3 


2 

6 

-0.09 

0.26 

0 

0 


3 

17 

-0.14 

0.70 

-2 

2 


5 

16 

0.20 

0.47 

0 

1 


6 

17 

-0.20 

1.36 

-5 

2 


8 

5 

0.15 

0.56 

-1 

1 

Combined 

74 

0.01 

0.82 

-5 

3 

MED 

1 

13 

-3.94 

7.19 

-22 

0 


2 

6 

-1.42 

1.68 

-3 

-1 


3 

17 

-4.49 

6.01 

-15 

-1 


5 

16 

-4.06 

5.06 

-13 

0 


6 

17 

-5.79 

7.88 

-20 

0 


8 

5 

-4.98 

6.43 

-10 

-2 

Combined 

74 

-4.38 

6.25 

-22 

0 
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Table 5c. Summary of relative errors for weekly means derived from SEEP mooring data: results for IC K . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 

Used 

Reference 

of Weeks 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

39 

0.00 

0.13 

-1 

0 


2 

18 

-0.01 

0.03 

0 

0 


3 

53 

0.09 

0.55 

0 

4 


5 

53 

0.04 

0.11 

0 

1 


6 

37 

0.16 

0.69 

0 

4 


8 

13 

0.05 

0.23 

0 

1 

Combined 

213 

0.06 

0.40 

-1 

4 

MED 

1 

39 

-0.73 

1.93 

-9 

0 


2 

18 

-0.08 

0.10 

0 

0 


3 

53 

-0.74 

2.09 

-11 

0 


5 

53 

-0.95 

1.90 

-11 

0 


6 

37 

-1.40 

2.71 ; 

-12 

0 


8 

13 

-1.44 

1.96 ! 

-5 

0 

Combined 

213 

-0.89 

2.03 

-12 

0 


Table 5c. (cont.) Summary of relative errors for monthly means derived from SEEP mooring data: results 
for IC K . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 

Used 

Reference 

of Months 

[%] 

(rms) [%] 

Minimum 

Maximum 

MLE 

1 

13 

0.12 

0.33 

0 

1 


2 

6 

-0.03 

0.03 

0 

0 


3 

17 

0.25 

0.70 

0 

2 


5 

16 

0.11 

0.29 

0 

1 


6 

17 

0.95 

1.51 

0 

3 


8 

5 

0.40 

0.86 

0 

2 

Combined 

74 

0.35 

0.84 

0 

3 

MED 

1 

13 

-1.15 

2.66 

-9 

0 


2 

6 

-0.17 

0.20 

0 

0 


3 

17 

-1.91 

3.89 

-11 

0 


5 

16 

-1.40 

1.91 

-6 

0 


6 

17 

-5.09 

7.13 

-14 

0 


8 

5 

-2.26 

3.12 

-6 

-1 

Combined 

74 

-2.28 

4.17 

-14 

0 
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Table 5d. Summary of relative errors for weekly means derived from SEEP mooring data: results for FNC 
estimators of IC K . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 


Used 

Reference 

of Weeks 

[%] 

(rms) [%] 

Minimum Maximum 

FNC(AVG) 

1 

39 

2.90 

7.25 

0 

30 


2 

18 

0.40 

0.52 

0 

1 


3 

53 

1.90 

3.63 

0 

13 


5 

53 

3.01 

5.16 

0 

28 


6 

37 

3.53 

5.71 

0 

19 


8 

13 

5.32 

6.62 

0 

11 

Combined 

213 

2.72 

5.24 

0 

30 

FNC(MLE) 

1 

39 

2.65 

5.85 

0 

24 


2 

18 

0.45 

0.56 

0 

1 


3 

53 

2.07 

4.32 

0 

18 


5 

53 

3.29 

6.06 

0 

35 


6 

37 

4.16 

7.28 

0 

28 


8 

13 

5.53 

7.23 

1 

17 

Combined 

213 

2.92 

5.66 

0 

35 


Table 5d. (cont.) Summary of relative errors for monthly means derived from SEEP mooring data: results 
for FNC estimators of IC K . 


Estimator 

Mooring 

Number 

Bias 

Error 

Range 


Used 

Reference 

of Months 

[%] 

(rms) [%] 

Minimum Maximum 

FNC(AVG) 

1 

13 

3.62 

7.23 

0 

23 


2 

6 

0.70 

0.90 

0 

1 


3 

17 

5.09 

8.59 

0 

25 


5 

16 

4.46 

5.43 

1 

14 


6 

17 

9.76 

12.62 

i 

26 


8 

5 

5.96 

7.85 

1 

13 

Combined 

74 

5.47 

8.45 

0 

26 

FNC(MLE) 

1 

13 

5.28 

12.14 

0 

40 


2 

6 

0.86 

1.02 

0 

2 


3 

17 

6.28 

11.26 

0 

31 


5 

16 

5.32 

7.42 

0 

23 


6 

17 

12.52 

16.30 

0 

29 


8 

5 

7.72 

10.60 

2 

18 

Combined 

74 

6.99 

11.46 

0 

40 
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Fig. 27. Results of correcting for bad data in the time series at mooring 6 (Fig. 26). In the upper panel, 
which displays a record of the 10 AM CHL measurements versus time at mooring 6, the dark squares are data 
missing from the original record. The open squares near zero are probably bad data. Cumulative means 
derived after removing these low values are shown in bottom panel. 
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Fig. 28. Simulated satellite CHL time series and cumulative mean CHL for summer deployment of mooring 
3 (Jun.-Oct. 1988). The upper panel displays 10 AM CHL measurements, and the lower panel displays 
cumulative means for the data shown in the upper panel. 
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Fig. 30. Cumulative mean CHL in CZCS scene 4 based on LAC data within boxes of increasing area (LxL) 
plotted against length, L. Results for boxes in the northern nearshore region (upper panel) and for the 
southern offshore region (lower panel). 
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Fig. 31. Cumulative mean CHL in CZCS scene 4, in this case based on GAC data, within boxes of increasing 
area (LxL) plotted against length, L. Results for boxes in the northern nearshore region (upper panel) and 
for the southern offshore region (lower panel). 
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Table 6. Comparison of cumulative means derived from CZCS data for largest areas ( LxL ). The length ( L ) 


is shown in column 2. 


CZCS 

Length 

Number 

Estimator 

% Error 

Scene 

[km] 

of Pixels 

AVG 

AVG4 

AVG 

AVG4 

1 

479 

336,494 

0.059 

0.059 

0.0 

1.0 

2 

463 

314,643 

0.061 

0.061 

0.0 

0.7 

3 

441 

286,176 

0.166 

0.163 

0.0 

-1.5 

4-N 

222 

72,381 

0.297 

0.302 

0.0 

1.4 

4-S 

460 

310,518 

0.143 

0.140 

0.0 

-2.4 

5-N 

480 

339,127 

0.298 

0.297 

0.0 

-0.4 

5-S 

291 

124,475 

0.384 

0.382 

0.0 

-0.4 

6 

224 

73,390 

0.629 

0.634 

0.0 

0.1 

7-N 

361 

191,092 

0.919 

0.933 

0.0 

1.5 

7-S 

238 

82,714 

1.052 

1.072 

0.0 

1.9 

CZCS 

Length 

Number 

Estimator 

% Error 

Scene 

[km] 

of Pixels 

MLE 

MLE4 

MLE 

MLE4 

1 

479 

336,494 

0.058 

0.059 

-0.2 

1.0 

2 

463 

314,643 

0.061 

0.061 

-0.2 

0.5 

3 

441 

286,176 

0.166 

0.164 

0.4 

-1.0 

4-N 

222 

72,381 

0.276 

0.276 

-7.1 

-7.3 

4-S 

460 

310,518 

0.144 

0.140 

0.3 

-2.2 

5-N 

480 

339,127 

0.297 

0.297 

-0.5 

-0.4 

5-S 

291 

124,475 

0.386 

0.385 

0.5 

0.2 

6 

224 

73,390 

0.531 

0.537 

-15.5 

-14.7 

7-N 

361 

191,092 

0.883 

0.908 

-3.9 

-1.2 

7-S 

238 

82,714 

0.973 

0.992 

-7.4 

-5.7 

CZCS 

Length 

Number 

Estimator 

% Error 

Scene 

[km] 

of Pixels 

MED 

MED4 

MED 

MED4 

1 

479 

336,494 

0.057 

0.058 



2 

463 

314,643 

0.059 

0.059 

-3.93 

-3.28 

3 

441 

286,176 

0.157 

0.155 


-6.46 

4-N 

222 

72,381 

0.229 

0.224 

-22.97 

-24.71 

4-S 

460 

310,518 

0.136 

0.132 

-5.31 

-7.75 

5-N 

480 

339,127 

0.287 

0.287 


-3.79 

5-S 

291 

124,475 

0.335 

0.335 

-12.76 

-12.87 

6 

224 

73,390 

0.192 

0.194 

-69.47 

-69.17 

7-N 

361 

191,092 

0.488 

0.490 

-46.86 

-46.72 

7-S 

238 

82,714 

0.582 

0.583 

-44.70 

-44.56 


AVG4 estimators which were based on much smaller sam- 
ples (n < 9). In both cases, differences were less than 
±2% (Fig. 8 and Fig. 10). These results differ somewhat 
from those of Baker and Gibson (1987) who found that 
the arithmetic average underestimated the true mean of 
a lognormal variate, and that the maximum likelihood es- 
timator was a better estimator of the mean when sam- 
ple sizes were small. In the small samples that resulted 
from using GAC data, both the MLE4 and AVG4 estima- 
tors had a slight tendency to underestimate the true mean 
(AVG), as indicated by their small negative biases (Fig. 9, 
and Figs. 11-14), but no significant difference was found 
between the two estimators. 

In the case of weekly and monthly means derived from 


the SEEP II data, the MLE and AVG estimators again 
proved to be nearly identical. The AVG estimator was 
nominally the true mean, but since it was based on small 
samples (7 days for weekly means and 31 or fewer days 
for monthly means), it is not necessarily better than other 
estimators of the mean. 

Although the MLE and AVG estimators are equivalent 
with respect to accuracy, it was recommended that the 
MLE estimator be used because of its flexibility in allowing 
the estimation of level -4 variables from saved statistics of 
level -3 variables. In the remainder of this discussion, two 
questions are raised regarding the equivalence of the MLE 
and AVG estimators, and the answers discussed. 

The first question is: How important is the assumption 
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that the variable is lognormally distributed? If the variable 
being sampled is lognormally distributed, then the MLE 
estimator (5) is the maximum likelihood estimator of the 
mean, and the estimators for the standard deviation (23), 
median (24), and mode (25) are also maximum likelihood 
estimators of these parameters. But what if the underlying 
distribution is not lognormally distributed? How robust 
will the estimator be if the lognormal assumption is not 
valid? 

The empirical evidence based on CZCS data and the 
SEEP II time series data supports the use of the MLE 
estimator. These data sets taken as a whole, i.e., a whole 
CZCS scene or a 16-month record from a single moored 
fluorometer, were approximately lognormal or mixtures of 
lognormal distributions. This was demonstrated for both 
satellite and in situ CHL distributions (Figs. 4 and 17), 
and observed for the other variables, but not shown. It 
is not surprising, therefore, that small subsets drawn from 
the whole data set behave as random samples drawn from 
a lognormal distribution. 

However, the binned data were not random samples. 
Instead, they consisted of measurements made close to- 
gether in space or time, and thus, they were correlated. 
To the extent that the binned data are positively corre- 
lated, the intrabin variance will be less than the variance 
of a random sample of the same size drawn from the whole 
data set. 

It is possible to show that the MLE estimator will be a 
good approximation to the mean of any distribution with 
s 2 < 0.5, where s 2 is the variance of the natural logarithm 
of the variable. This result is derived from the series ex- 
pansion for the exponential function 

e * = 1 + x + ¥ + fr + ••• ( 46 ) 

Let X be any random variable whose distribution is 
unknown. Define x = ln(X), and let m and s 2 be the 
mean and variance of x. Then 

X = e x = e m e ( ' x ~ m \ (47) 

and the expected value of X is 

E[X| = e'“ E [l + (x -m) + - 

+ GL22>! + ...] (48) 

= e ” E t 1 + i + f + "] 

where m* denotes the ith central moment of x, defined by 

mi = e[(x— m)*]. (49) 

It is also noted that mi = 0, and m2 = s 2 . 


If the variance is less than or equal to 0.5, then the 
terms involving higher order central moments will be a 
rapidly decreasing series. In fact, the series in brackets in 
(48) can be approximated by its first two terms 


E[X] « e m 1 



e m e ? s2 


(50) 


The term on the right is the MLE estimator of the mean. 
Thus, there are two situations when the MLE estimator is 
valid: 1) when the underlying distribution is lognormal, or 
2) when the variance of the natural logarithm of the vari- 
able is less than or equal to 0.5 (or the standard deviation 
of the base-10 logarithm is less than or equal to 0.3). 

Figure 32 is a plot of the average variance of the loga- 
rithm of CHL within bins of size LxL plotted as a function 
of L for the CZCS scenes 1-5. It is noted that the variance 
within 9x9 km 2 bins was less than 0.5 for all five scenes, 
and the variance remained less than 0.5 as L increased up 
to the maximum length of 480 km. In scene 4, variances 
exceeded 0.5 at L greater than about 100 km. 

The second question is: Under what circumstances do 
the MLE and AVG estimators disagree, and is it possible 
to predict the nature and magnitude of their differences? 

In the study of cumulative means (Figs. 26-31), there 
were examples shown where the MLE and AVG estima- 
tors began to diverge as the size of the sampling domain 
increased. In one example (Figs. 26-27), the divergence 
could be associated with bad data, and the conclusion 
was that the MLE estimator was sensitive to anomalously 
low values. The possibility that similar errors might affect 
level -3 SeaWiFS data should be considered. 

The discussion related to the first question suggests 
another circumstance in which the MLE and AVG estima- 
tors might disagree: the situation where the variance of 
the logarithm is large and the variable is not lognormally 
distributed. A situation such as this would occur when the 
sampling domain contains a mixture of lognormal distri- 
butions. In the case of spatial statistics, this would occur 
in frontal areas between sharply contrasting water types, 
e.g., high-chlorophyll waters mixing with low- chlorophyll 
waters. It is likely to be more common in sampling do- 
mains covering longer time periods. 

Most of the CZCS scenes can be modeled as mixtures 
of lognormal distributions. Table 7 lists the means and 
variances of lognormal distributions that were fit to modes 
of the histograms shown in Fig. 4. Values of CHL derived 
according to the CHL23 formula were excluded from the 
fits. Note that within all modes, the variance was less 
than 0.5. However, when two or more modes are mixed, 
the variance of the mixture distribution will be increased 
due to differences between modes. 

It is possible to quantify errors associated with the 
MLE estimator in the case of mixture distributions. An 
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Table 7. Results of fitting normal distributions to the modes of the histograms of log(CHL) in the CZCS scenes 
analyzed. 


Scene 

Number 

Mode 1 

Mode 2 

Mode 3 

m 

s 2 

m 

s 2 

m 

s 2 

1 

-2.82 

0.04 





2 

-2.98 

0.02 

-2.68 

0.06 



3 

-2.41 

0.04 

-1.70 

0.04 



4 

-2.35 

0.06 

-1.81 

0.06 

-1.06 

0.16 

5 

-2.38 

0.07 

-1.25 

0.08 

-0.63 

0.04 

6 

-2.65 

0.09 

-1.58 

0.45 

0.22 

0.09 

7 

-1.78 

0.15 

-1.00 

0.18 

-0.07 

0.15 



Fig. 32. Average variance of ln(CHL) within areas of size LxL, as a function of L for CZCS scenes 1-5. 
Results for scenes 6 and 7 were not obtained because of the discontinuity in the CHL distributions in these 
scenes, which is an artifact of the bifurcated CZCS algorithm. 


example is the case where there are two modes mixing in 
a sampling domain. Let each mode be a lognormal distri- 
bution with parameters m* and s\ where i~ 1 or 2. If P 
is the proportion of the distribution that is mode 1, then 
the mean of the distribution is 


X avg = Pe( m ' + ^) + (i-P) e ( m *+* s i), (51) 

and the MLE estimator is 

Zmie= exp P^mi + ^ + (1-P)^m 2 + ^ 

+ P(1 - P) (mi ~ : m2)2 | . (52) 


Relative errors for pair-wise mixtures of the modes list- 
ed in Table 7 are plotted against P in Fig. 33, where mi < 


m 2 , and P is the proportion of the lower-chlorophyll mode. 
There are 14 curves shown in this figure, but only 5 have 
errors that are significantly different from zero. The largest 
errors (differences between MLE and AVG) occurred when 
modes from scene 6 were mixed, and especially when mode 
1 (mean CHL = 0.07 mgm -3 ) was mixed with mode 3 
(mean CHL = 1.3mgm~ 3 ). Of all the cases considered 
here, the highest positive error (40%) occurred when 30% 
of mode 1 was mixed with 70% of mode 3 in scene 6, and 
the highest negative error (-30%) occurred when 90% of 
mode 1 was mixed with 10% of mode 3. 

The patterns shown here indicate that the MLE can ei- 
ther under or over estimate the true mean when there are 
mixtures of lognormal distributions within the sampling 
domain. The MLE estimator tended to exceed AVG for 
low values of P, whereas AVG exceeded MLE for high val- 
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Fig. 33. Relative errors (%) in the MLE estimator resulting from mixtures of two lognormal distributions, 
plotted against P, the proportion of the mixture derived from the lower-chlorophyll mode. The 14 cases 
depicted in this figure represent all pair-wise combinations of modes within the seven CZCS scenes (see Table 

7 ). 


ues of P. The former was the situation with the SEEP data 
where there were a few low values in the data record that 
caused MLE(n) > AVG(n) to diverge. Likewise, the oppo- 
site seemed to be the case with CZCS data, where there 
were relatively few high values, e.g., values derived using 
the CHL23 algorithm, that brought about divergences be- 
tween MLE (L) < AVG(L). 

The situations depicted in Fig. 33 may not be inclusive 
of all possible mixtures that would occur in nature, but 
they do span the range in the seven scenes analyzed. It is 
clear that patterns are complex, and yet, reassuring that 
with very few exceptions, errors were within ±10%. 

3.4 Conclusions 

The MLE estimator is a reasonably accurate estima- 
tor for the mean of CHL and other satellite-derived vari- 
ables within sampling domains. It behaves as well as the 
arithmetic average, and yet it has an advantage over the 
AVG estimator in that it can be used to estimate means of 
a large class of level -4 variables derived from the level -3 
data. There were two situations that assure agreement be- 
tween the AVG and MLE estimators: 1) if the variable is 
lognormally distributed within the sampling domain, or 2) 


if its variance is low. If the variance of its natural loga- 
rithm is less than 0.5, then AVG and MLE should agree 
regardless of the underlying distribution. 

Two circumstances were identified where the MLE and 
AVG estimators are expected to disagree. One is the case 
where there are anomalously low values in the data (pre- 
sumably bad data), and the other is where the sampling do- 
main contains a mixture of lognormal distributions. Based 
on mixtures found in seven CZCS scenes spanning a wide 
variety of ocean environments, relative errors would typi- 
cally be within ±10%. 
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Appendices 

A. Equal-area Gridding Scheme for SeaWiFS Binned 
Data 

B. Scheme for Weighting Data 

C. Algorithms for Binning and Interpreting SeaWiFS 
Binned Data 

Appendix A 

Equal- area Gridding Scheme for SeaWiFS Binned Data 

Introduction: This appendixf describes the equal-area gridding 
scheme developed by the RSMAS Remote Sensing Group for 
binned ocean fields. The same approach has been adopted for 
AVHRR Ocean Pathfinder SST products and is proposed for 
MODIS. The gridding scheme is based on that adopted by the 
International Satellite Cloud Climatology Project (ISSCP). 

This document does not motivate the need for an equal area 
grid for SeaWiFS or other oceanographic products. Such moti- 
vation can be found in a paper by W. Rossow and L. Gardner 
(1984). Furthermore, this document describes only the design 
of the proposed equal-area grid, and does not discuss other re- 
lated topics such as rules for spatially or temporally combining 
observations into the equal-area bins. 

Overview: The gridding scheme proposed consists of rec- 
tangular bins or tiles, arranged in zonal rows. A compromise 
between data processing and storage capabilities, on one hand, 
and the potential geophysical applications of satellite data, on 
the other hand, suggest that a suitable minimum bin size would 
be approximately 8-10 km on a side. 

In the scheme proposed here, the tiles are approximately 
9.28 km on a side. This size (9.28 km) was chosen because (a) 
it has approximately the desired minimum resolution, and (b) 
it results in 2,160 zonal rows of tiles from pole to pole, i.e., 1,080 
in each hemisphere. This particular number of rows (2,160) has 
some advantages which will be discussed in more detail below. 
Because the total number of rows is even, the bins will never 
straddle the equator, i.e., there will be an equal number of rows 
above and below the equator. This avoids possible situations 
where the Coriolis factor is zero, a characteristic that numerical 
modellers expect from any gridding scheme adopted. 

The total number of approximately 9 km bins is 5,940,422. 
The bins or tiles are arranged in a series of zonal rows; the num- 
ber of tiles per row varies. The rows immediately above and 
below the equator have 4,320 tiles. This number is derived by 
dividing the perimeter of the Earth at the equator by the stan- 
dard tile size, i.e., 27ril e /9.28, where R e is the equatorial radius 
of the Earth ( R e = 6378.145 km). The number of tiles per row 
decreases approximately as a cosine function as the rows get 
closer to each pole (rigorously, there should be an adjustment 
for ellipticity of the Earth, as the equatorial radius decreases 
progressively to the smaller polar radius; this adjustment is 
not applied in the current implementation). At the poles, the 
number of tiles is always three. This special situation will be 
discussed in detail below. The number of tiles per row as a 
function of latitude is shown on Fig. A-l. 

The number of bins in each zonal row is always an integer. 
To ensure an integer number of bins, the width of each bin (the 
size of a bin along a parallel, or x-length) must vary slightly 


f This text is courtesy of the Remote Sensing Group, Rosen- 
stiel School of Marine and Atmospheric Science, University 
of Miami. 


from row to row. The bins, however, are always 9.28 km long 
along the meridians. That is, only one of the bin dimensions 
changes. The size of the bins at each zonal row is established in 
the following manner. First, a preliminary value for the number 
of tiles ( N p ) at a given latitude ( L ) is computed as 

N p = 2 ttv/X, 

where X is the x-size of a bin at the equator (9.28 km) and r 
is the radius of the circle produced by slicing the Earth with a 
plane parallel to the equator at latitude L. The radius r can be 
calculated as 

r = R e cos(L), 

where R e is the equatorial radius of the Earth. If the fractional 
part of N p is greater than or equal to 0.5, then N p is rounded up 
to the nearest integer, i.e., the final number of tiles will be the 
integer portion of N p plus one; otherwise, N p is rounded down. 
The final number of tiles is the integer portion of N p . Once 
the final integer number of tiles along a row is calculated, the 
x-size of the tiles must be adjusted. This is done by dividing 
the perimeter of the row (27rr) by the integer number of tiles. 
The result is the x-length (width) of a tile for a given row. 

Because the x-length of the tiles is adjusted to ensure an in- 
teger number at each row, the equal area characteristics of this 
binning scheme are not rigorously preserved. However, varia- 
tions in tile size are negligible throughout most of the globe 
and only become relevant at very high latitudes, where there 
are fewer tiles per row, and any adjustments are more notice- 
able. As the number of tiles increases with distance from the 
poles, the difference between tile sizes rapidly becomes practi- 
cally unnoticeable. To provide an idea of the magnitude of the 
fluctuations in tile size, the worst possible case occurs when half 
a tile remains uncovered after filling a zonal row with an inte- 
ger number of tiles. Once a row has 100 bins (approximately 16 
rows, or 148 km from the poles), the worst possible difference 
between the actual tile x-length and the standard x-length is of 
the order of 0.5%, i.e., half a tile’s length redistributed among 
about 100 tiles. For a tile of about 9 km a side, this represents 
a difference in the x-length of about 45 m. Through a similar 
calculation, a row with 50 bins (about 80 km away from the 
poles) has a 1% variation with respect to the standard bin size. 

The gridding scheme described here has an extremely useful 
feature. The number of 9.28 km tiles in each hemisphere (1,080) 
is divisible by many numbers (e.g., 2, 3, 4, 5, 6); and therefore, it is 
extremely easy to generate an integer number of rows at many 
useful spatial resolutions. For instance, 12 rows of approxi- 
mately 9.28 km tiles can be combined to generate zonal bands 
of 1° (1° of latitude is equal to 111.12km; 12 bins would form 
a band 111.20km wide). Another example is the use of 30 rows 
to generate zonal bands of 2.5°, a typical output resolution of 
atmospheric circulation models. 

The poles: Both the North and South Poles are special cases 
in the gridding scheme presented here. The pole areas are al- 
ways covered by three tiles shaped like pie sectors. While the 
meridional size of the polar bins (the y-length) will be the usual 
9.28 km, the length of the bins along the arc of the sectors will 
be slightly larger. Neglecting sphericity, the area encompassed 
by the last row of tiles is 7rX 2 , where X = 9.28 km. If the 
area of the circle is expressed as a rectangle of height X, the 
remaining dimension is nX. If the perimeter is divided by three 
(to yield three tiles), each tile will have dimensions X by 7 tA/ 3 
(approximately 1.05X). Thus, the bases of the triangular polar 
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tiles are about 5% larger than the x-length of the equatorial 
tiles. 

Binning software: Several routines have been developed to 
perform the principal transformations required for binning and 
mapping data, such as converting latitudes and longitudes into 
bin numbers. Other routines perform the inverse transforma- 
tion, i.e., given a bin number they return a latitude and longi- 
tude corresponding to the centroid of that bin. These routines 
use a common initialization routine that must be executed prior 
to calling the conversion routines. 

Two numbering schemes are used internally, corresponding 
to one- and two-dimensional (1-D and 2-D, respectively) ac- 
cessing schemes. The 1-D scheme numbers all bins consecu- 
tively, beginning with 1 at the southernmost row and working 
eastward from —180° around each circle of latitude. The 2-D 
scheme uses a row number, from 1 to 2,160, and a number to 
indicate its location within the row, beginning at 1 for each 
row. 

Variable Dictionary: The variables and their definitions for the 
pseudocode axe presented below. 

NUMROWS The (integer) number of rows in the grid (equal to 
2,160 for SeaWiFS). 

BASEBIN An integer*4 array of size NUMROWS that contains 
the index number of the first bin in each row. 

NUMB IN An integer array of size NUMROWS containing the 
total number of possible bins in each row. 

LATBIN A real *4 array of size NUMROWS that contains the 
center latitudes (decimal degrees) of the corre- 
sponding BASEBINs. 

TOTBINS The (integer*4) number of possible bins in the grid 
(equal to 5,940,422 for NUMR0WS=2,160). 

ROW The row number (integer); range is 1 to NUMROWS. 

COL The bin number (integer); the range is from 1 to 
NUMBIN(ROW) . 

IDX The bin index number (integer*4); range is 1 to 
TOTBINS. 

LAT The input latitude (real*4) for obtaining the cor- 
responding bin’s ROW and COL, or IDX; or the output 
latitude for a bin specified by ROW and COL, or IDX. 
(The range for LAT is —90 to +90 decimal degrees.) 

LON The input longitude (real*4) for obtaining the 
corresponding bin’s ROW and COL, or IDX; or the 
output longitude for a bin specified by ROW and 
COL, or IDX. (The range for LAT is —180 to 180 
decimal degrees.) 

Pseudocode: The following pseudocode demonstrates the gen- 
eration of the grid and the calculations for determining the 
center latitude and longitude for a given bin and for identifying 
a bin given a latitude and a longitude. The algorithms are illus- 
trative in purpose and do not necessarily represent an optimal 
implementation. They are based on software developed by J. 
Brown, University of Miami. 

# 

# Set up NUMBIN and BASEBIN arrays 

# 

BASEBIN(l) - 1 

do from ROW = 1 to NUMROWS 

LATBIN (ROW) = ( (R0W-0. 5) *180 .0/NUMROWS) - 90.0 


NUMBIN (ROW) = 

int ( (2*NUMR0WS*cos_dbl_deg(LATBIN(R0W) ) ) +0 . 5) 
if R0W>1 then BASEBIN (ROW) = BASEBIN(ROW-l) + NUMBIN(ROW-l) • 
end do 

TOTBINS = BASEBIN (NUMROWS) + NUMBIN (NUMROWS) - 1 
# 

# Identify bin from lat (-90 to +90) and Ion (-180 to 180) 

# 

ROW = integer ( (90 . 0+LAT) * (NUMROWS/ 180 . 0) ) + 1 
ROW - minimum(R0W, NUMROWS) 

LON = LON + 180.0 

COL = integer (L0N*NUMBIN(R0W) /360 . 0) + 1 
COL = minimum (COL, NUMBIN (ROW)) 

IDX = BASEBIN (ROW) + COL - 1 
# 

# Get bin center lat/lon for given bin index 

# 

ROW = NUMROWS 

IDX = maximum (IDX, 1) 

do while IDX<BASEBIN(R0W) 

ROW - ROW - 1 
end do 

LAT = LATBIN (ROW) 

LON = 360. 0*(IDX-BASEBIN (ROW) +0.5) /NUMBIN (ROW) 

LON = LON - 180.0 
# 

# Get bin center lat/lon for given bin row/column 

# 

LAT = LATBIN (ROW) 

LON = 360.0* (COL-0. 5) /NUMBIN (ROW) 

LON = LON - 180.0 


Appendix B 

Scheme for Weighting Data 

This appendix describes the scheme used to weight data from 
different times (orbits) in producing temporal means and vari- 
ances. The level -2 SeaWiFS data will be log-transformed before 
the following schemes are applied. Note that the lower case let- 
ter x is used to denote the natural logarithm of the variable 
X, that is, x = ln(X). The MLE estimator for the mean of 
a lognormal variable X requires that the maximum likelihood 
estimators of the mean and variance of x be obtained first. 


The Textbook Case for Unweighted Data: If the data within a 
sampling domain, Xi,i = 1, . . . ,n, are independent and identi- 
cally distributed normal random variables with a true mean /x 
and variance cr 2 , then the sample mean 



i=i 


(B 1 ) 


is the maximum likelihood estimator of fi. The sample variance 
is defined as 


s 


2 


and computed as 


s 


2 



i=i 


m 


(B 3) 


is the maximum likelihood estimator of the variance, <r 2 . Note 
that s 2 is not the more common unbiased estimator of the vari- 
ance which is obtained by multiplying (B3) by n/(n — 1). For 
the specific case of SeaWiFS spatial statistics, i.e., for data 
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within a bin obtained during the same orbital pass, equations 
(Bl) and (B3) will be used to compute the mean and variance 
of x = ln(X), for each variable X. 

The General Case for Weighted Data: Let Wi be the weight 
given to the ith observation (data point). The weighted mean 
and variance analogous to (Bl) and (B3) are 

* = wS WiXi (B4) 

i=i 

s2 = w ^ WiX *~* 2 (S5) 

i=i 

where W is the sum of the weights 

n 

w = X! w< - 

i=i 


Appendix C 

Algorithms for Binning and Interpreting 
SeaWiFS Binned Data 

Three algorithms are described and their pseudocodes presented 
in this appendix: the Space Binner algorithm bins data from a 
single scene; the Time Binner algorithm bins output from the 
Space Binner (or from the Time Binner) to accumulate sums 
over binning periods; and the Bin Data Interpreter is used to 
interpret binned data products to derive the mean, standard de- 
viation, median and mode of level -3 data. Only GAC data will 
be binned operationally to generate archived level -3 products. 

Spatial Binning Algorithm: The spatial binning algorithm is 
applied to the level-2 GAC scenes. In general, there will be 
one set of spatial statistics created for each scene. The only 
exception will be when an orbit crosses 180° longitude, in which 
case there will be two sets of spatial statistics corresponding to 
different days. 


The Specific Case for Weighted Data: How this applies to the 
weighting of spatial statistics as they are binned over time are 
considered here. In general, there will be N sets of spatial 
statistics, each corresponding to a time ti, i = 1, . . . , N, and each 
set of spatial statistics will be based on ni observations from 
the same orbital pass. To be obtained is a weighted mean and 
variance of the data over observation Xij where j refers to the 
j th observation at time ti and i — 1, . . . ,N, and j = 1, . . . , nj. 
(Recall that xy = ln(Xjj). 

One approach would be to compute a mean, x*, and variance, 
s*, for each set of spatial statistics, and then simply average 
the means and variances over all times, ti = 1, . . . , N. If this 
approach is used, the weights applied to each observation would 
be 

w y = — . (B7) 

Ui 

It was decided that this gave too much weight to data sets 
having few observations. The alternative is to weight all data 
equally, but this gives too much weight to the data sets with 
numerous observations. The compromise was to use 


Let Xji be an acceptable observation of the variable Xj in pixel 
z, and let LON(z) and LAT(z) be the longitude and latitude at the 
center of pixel z. (A pixel will be considered to have accept- 
able level -2 data if it passes screens for sun glint, clouds and 
other masks, in which case all of the variables will be considered 
acceptable.) From these coordinates, the bin index number b 
will be determined according to known relationships (see Ap- 
pendix A). 

Then for each variable j, the natural logarithm LOGX = ln(Xji) 
is obtained, and the following sums incremented 

SUMX(bj) = SUMX(6,j) + LOGX (Cl) 

and 

SUMXX(6,j) = SUMXX(bJ) + LOGX x LOGX . (C2) 

In addition, the number of pixels contributing to the sums in 
bin b is incremented 

N(6) = N(6) + 1, (C3) 



Equations (B9) and (BIO) will be used to obtain the temporal 
statistics of ln(X) in each sampling domain. 


and a binary- valued variable is set to 1 to indicate bin b contains 
data 

NSEG(b) = 1. (C4) 

After processing all valid data from this scene, the total weight 
for each bin is computed 

W(b) = VbKb), (C5) 

and the variable sums are weighted as per (B9) and (BIO) 


and 


SUMX(bJ) 


SUMX(6,j) 

W(b) 


SUMXX(6,j) 


SUMXX(bj) 

W(b) 


(c 6) 

(C7) 


Finally, a 16-bit number TT(6) is defined for each bin. This 
number will be used in subsequent stages of temporal binning 
to indicate the temporal distribution of the data. In the spatial 
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binning algorithm, all bits in TT(6) will be 0 except the lowest 
bit which will be set to 1 if there are data in bin b. 

The output from the spatial binning algorithm consists of the 
spatial statistics for each bin: 6, N(b), NSEG(6), W(b), TT(6), 
and a pair of weighted sums, SUMX(6,y) and SUMXX(5,j), for 
each variable j. 

Space Binner Code: This program takes one level -2 scene as 
input and bins it into one (or two, if the level -2 scene crosses 
180° longitude) level -3 binned data product as output. This 
is called spatial binning since the bins are of lower resolution 
and the level -2 product is considered to represent a snapshot, 
i.e., no time averaging occurs, of the Earth’s surface. Products 
generated by this program are not archived but are used as 
input to the time binner. 

Variable and Constant Dictionary : The variables and their def- 
initions for the pseudocode are presented below. 

Constants 

MAXBINS The maximum number of bins (5,940,422). 

NVARS The number of derived level -3 geophysical vari- 
ables whose observational values are stored in the 
associated SUMX and SUMXX pairs. 

Level -2 Variables 

NPIXELS The number of pixels in a scan line of the input 
level-2 product. 

NSCANS The number of scan lines in the input level-2 prod- 
uct. 

PXLAT A real *4 1-D array of size NPIXELS; represents the 
latitude for a given pixel I of a given scan line L. 

PXLON A real*4 1-D array of size NPIXELS; represents the 
longitude for a given pixel I of a given scan line L. 

OBS A real*4 2-D array of size NPIXELS x NVARS; rep- 
resents the derived level -2 values that are to be 
binned into the level -3 product for a given pixel I 
of a given scan line L. 

Output Variables 

SUMX A real*4 2-D array of size MAXBINS x NVARS; rep- 
resents the sum of the natural logarithm of the 
level -3 geophysical variable’s values divided by the 
square root of the number of those values for a 
given bin IDX; saved in the output product if, and 
only if, N(IDX) is greater than zero. 

SUMXX A real*4 2-D array of size MAXB INS x NVARS; repre- 
sents the sum of squares of the natural logarithm 
of the level -3 geophysical variable’s values divided 
by the square root of the number of those values 
for a given bin IDX; saved in the output product if, 
and only if, N(IDX) is greater than zero. 

N An integer*2 1-D array of size MAXBINS; repre- 
sents the number of values summed into SUMX and 
SUMXX for all variables (Js) and for a given bin IDX; 
saved in the output product if, and only if, N(IDX) 
is greater them zero. 

NSEG An integer*2 1-D array of size MAXBINS; repre- 
sents the number of level -2 scenes which contribu 
ted to SUMX and SUMXX for all Js for a given bin IDX; 


saved in the output product if, and only if, N(IDX) 
is greater than zero. For this program, since only 
one scene is input, all saved values of NSEG will be 
1. 

W A real*4 1-D array of size MAXBINS; represents the 
weight factor for all Js for a given bin IDX; calcu- 
lated as the square root of N(IDX); saved in the 
output product if, and only if, N(IDX) is greater 
than zero. . 

TT An integer*2 1-D array of size MAXBINS; the bit 
values of TT represent the time trend of the values 
summed into SUMX and SUMXX for all Js for a given 
bin IDX; saved in the output product if, and only 
if, N(IDX) is greater than zero. For this program, 
since only one scene is input, all saved values of TT 
will have the lowest bit only set to 1. 

IDX An integer *4 word representing the index number 
of each bin with a value ranging from 1 to MAXBINS; 
saved in the output product if, and only if, N(IDX) 
is greater than 0. 

Note: For each N(IDX)> 0, 8xNVARS-(-14 bytes of infor- 
mation will be output. 

Other Variables 

I Counter index of pixels on a scan line. Range is 
from 1 to NPIXELS. 

J Counter index of geophysical variables to be binned. 
Range is from 1 to NVARS. 

L Counter index of scan lines. Range is from 1 to 
NSCANS. 

XLOG Natural logarithm (real*4) of OBS for a given I 
and J. 

# 

# Initialize 

# 

do from IDX=1 to MAXBINS 
do from J=1 to NVARS 
SUMX ( IDX, J) = 0.0 
SUMXX ( IDX, J) = 0.0 
end do 
N(IDX) = 0 
NSEG (IDX) = 0 
TT(IDX) = 0 
end do 

read from level-2 scene: NPIXELS, NSCANS 
# 

# Input level-2 scene and accumulate stats for each bin 

# 

# 

do from L=1 to NSCANS 

read arrays PXLAT, PXLON, OBS for scan line L 
do from 1=1 to NPIXELS 

if sample I passes screen flags then 
IDX = get_bin_index(PXLAT(I) ,PXL0N(I) ) 
do from J-l to NVARS 

XLOG = natural_log(0BS(I, J)) 

SUMX ( IDX, J) = SUMXCIDX, J) + XLOG 
SUMXX ( IDX, J) = SUMXX ( IDX, J) + XLOG+XLOG 
end do 

N(IDX) = N(IDX) + 1 
NSEG (IDX) = 1 
end if 
end do 
end do 
# 
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# Divide sums by weight and output space binned product 

# 

do from IDX=1 to MAXBINS 
if N(IDX) > 0 then 

set lowest bit of TT(IDX) 

W(IDX) = square_root (N(IDX) ) 
do from J=1 to NVARS 

SUMXCIDX, J) = SUMX ( IDX , J ) /W ( IDX) 

SUMXXCIDX, J) - SUMXXCIDX, J)/W( IDX) 
end do 

write to space binned level-3 product: 

IDX, N(IDX) , NSEG(IDX), W(IDX) , TT(IDX) 

SUMXCIDX, J), SUMXXCIDX, J), for J=1 to NVARS 
end write 
end if 
end do 

Temporal Binning Algorithm: The temporal binning algorithm 
combines the appropriate spatial statistics within each sampling 
domain. The sampling domain for a particular bin will be either 
a day, week, month, or year. 

For each set of spatial statistics there is an associated time 
t. The output from the spatial algorithm at time t will be the 
input for the temporal binning algorithm. Let this input be 
indexed by the time t: N(b) t , NSEG(6) t , W(b) t , TT(6) t , and 
the pairs of weighted sums, SUM X(bJ) t and SUMXX(6,jf)t, for 
each variable Xj. 


Constants 

MAXBINS The maximum number of bins (5,940,422). 

NVARS The number of derived level-3 geophysical vari- 
ables whose observational values are stored in the 
associated SUMX and SUMXX pairs. 

Input Variables 

NBINS The number of bins to read from an input level -3 
product. 

SUMX-INPUT A real*4 1-D array of size NVARS; represents 
SUMX as output by the space or time binner for 
all level -3 geophysical variables (Js) of a given bin 
IDX being read. 

SUMXX_INPUT A real*4 1-D array of size NVARS; represents SUMXX 
as output by the space or time binner for all level -3 
geophysical variables ( Js) of a given bin IDX being 
read. 

N.INPUT An integer *2 word; represents N as output by the 
space or time binner for a given bin IDX being read. 

NSEG.INPUT An integer*2 word; represents NSEG as output by 
the space or time binner for a given bin IDX being 
read. 


If N(b) t > 0, then the temporal sums 


W.INPUT A real*4 word; represents W as output by the space 
or time binner for a given bin IDX being read. 


SUMX(6,j) = SUMX(6,j) 4- SUMX(5j) t (C8) 

and 

SUMXX(6j) = SUMXX(6,j) + SUMXX(6,j) t (C9) 

are incremented for each variable j . In addition, the number of 
pixels contributing to the sums is counted 


TT.INPUT An integer*2 word; represents TT as output by 
the space or time binner for a given bin IDX being 
read. 

IDX An integer *4 word representing the index num- 
ber of the bin being read from an input level -3 
product. 


N(b) = N(b) + N(b)t (CIO) 

and the number of spatial data sets (orbits) contributing to the 
sums 

NSEG(6) = NSEG(6) 4- NSEG(6) t . (Cll) 
The sum of weights is computed 

W(b) = W(b) 4- W(b) t . (C12) 

and the appropriate bit of the time distribution variable TT(6) 
is set to 1 to reflect that data were present at time t in bin b. 

Output from the temporal binning algorithm consists of the 
level-3 data for each bin: 6, N(b), NSEG(6), W(b), TT(6), and 
a pair of weighted sums, SUMX(6,y) and SUMXX(6,y), for each 
variable j. Note that the output from the temporal binning 
algorithm is in the same form as its input. In fact, daily binned 
products can serve as input to the temporal binning algorithm 
to produce weekly, monthly, or longer-term products. 

Time Binner Code: This program takes as input level -3 binned 
segment products produced by the space binner and combines 
them into a binned product representing one day or takes binned 
products produced by the time binner (this program) and com- 
bines them into longer-term binned products. This process is 
called temporal binning since it combines data over a certain 
time period while not changing their spatial resolution. 

Variable and Constant Dictionary: The variables and their def- 
initions for the pseudocode are presented below. 


Output Variables 

SUMX A real*4 2‘D array of size MAXB INS x NVARS; rep- 
resents the sum of the SUMX.INPUT for the level -3 
geophysical variables (Js) from all input products 
for a given bin IDX; saved in the output product if, 
and only if, N(IDX) is greater than zero. 

SUMXX A real*4 2-D array of size MAXBINS x NVARS; repre- 
sents the sum of the SUMXX. INPUT for the level -3 
geophysical variables (Js) from all input products 
for a given bin IDX; saved in the output product if, 
and only if, N(IDX) is greater than zero. 

N An integer*2 1-D array of size MAXBINS; repre- 
sents the sum of the N.INPUT from all input prod- 
ucts for a given bin IDX; saved in the output prod- 
uct if, and only if, N(IDX) is greater than zero. 

NSEG An integer*2 1-D array of size MAXBINS; repre- 
sents the sum of the NSEG.INPUT from all input 
products for a given bin IDX; saved in the output 
product if, and only if, N(IDX) is greater than zero. 

W A real*4 1-D array of size MAXBINS; represents the 
sum of the W_ INPUT from all input products for a 
given bin IDX; saved in the output product if, and 
only if, N(IDX) is greater than zero. 

TT An integer*2 1-D array of size MAXBINS; the bit 
sequence of TT represent the time trend of the val- 
ues summed into SUMX and SUMXX for all Js for a 
given bin IDX; saved in the output product if, and 
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only if, N(IDX) is greater than zero. The bits rep- 
resent consecutive time in the binning period, the 
lowest bit being the earliest time. For daily binned 
products, the bits correspond to the relative se- 
quence of orbits binned. For 8-day products, each 
bit represents one day; for monthly products, each 
bit represents two days; and for yearly products, 
each bit represents one month. A TT(IDX) bit will 
be set to 1 only if data, for the time corresponding 
to that bit, were binned in bin IDX. 

IDX An integer *4 word representing the index number 
of each bin with a value ranging from 1 to MAXBINS; 
saved in the output product if, and only if, N(IDX) 
is greater than 0. 

Note: For each N(IDX)> 0, 8xNVARS-|-14 bytes of infor- 
mation will be output. 

Other Variables 


Algorithms for Calculating Statistics of Level -3 Variables: The 
means, variances, medians, and modes can be estimated using 
the level -3 data as described in Section 2.3. Here the same 
equations are described in terms of the pseudocode logic used 
in this Appendix. The level -3 data provided for each bin are: 
b , N(b), NSEG(fc), W(b), TT(6), and a pair of weighted sums, 
SUMX(6,j) and SUMXX(6,j), for each level -3 variable j. 

For each variable Xj , the mean and variance of its natural log- 
arithm are calculated 


SUMX(fcj) 
mx ~ W(b) 

and 

2 _ SUMXX(bj) 2 
Sx " W(b) m *' 

The MLE estimator for the mean of Xj in bin b is 


(C13) 

(C14) 


J Counter index of geophysical variable to be binned. 
Range is from 1 to NVARS. 

B Counter index of bins read from input product. 
Range is from 1 to NBINS. 


X(b,j) = e ( m ’ t+ H) 


and the standard deviation of Xj is estimated by 


SD (b,j) 



(C15) 

(C16) 


it 

# Initialize 

# 

do from IDX-1 to MAXBINS 
do from J-l to NVARS 
SUMX(IDX,J) = 0.0 
SUMXX(IDX, J) * 0.0 
end do 
NCIDX) * 0 
NSEG(IDX) - 0 
W(IDX) * 0.0 
end do 
# 

# Input space or time binned products and accumulate 

# statistics for each bin 

# 

do for each binned input product 

read from metadata of binned input products: NBINS 
do from B-i to NBINS 
read from bin B: 

IDX, N_ INPUT, NSEG.INPUT, W.INPUT, TT INPUT 
SUMX_INPUT( J) , SUMXX.INPUT(J), for J=1 to NVARS 
end read 

do from J=*l to NVARS 

SUHX(IDX.J) - SUMX(IDX.J) + SUMX_INPUT(J) 
SUMXX(IDX, J) = SUMXX(IDX, J) + SUMXX.INPUT(J) 
end do 

N(IDX) - N(IDX) + N.INPUT 
NSEG(IDX) = NSEG(IDX) + NSEG.INPUT 
W(IDX) - W(IDX) + VLINPUT 

use TT_ INPUT , date, or orbit of input to set TT(IDX) 
end do 
end do 
# 

# Output time binned product 

# 

do from IDX-1 to MAXBINS 
if N(IDX) > 0 then 

write to time binned level-3 product 

IDX, N(IDX) , NSEG(IDX) , W(IDX), TT(IDX) 

SUMXCIDX, J) , SUMXXCIDX, J) , for J-l to NVARS 
end write 
end if 
end do 


and [SD(6,j)] 2 is the estimated variance. 

Assuming the distribution of Xj is approximately lognormal, 
then the median can be estimated by 

Xmed (b,j) = e mx , (C17) 

and the mode (most common value) by 

Xmod (b,j) = (C18) 


Bin Data Interpreter Code: This program interprets the geo- 
physical data from binned products created by the space binner 
or the time binner. It will calculate the maximum likelihood 
estimate (MLE) of the mean, standard deviation, median, and 
mode for each level -3 binned geophysical variable. 


Variable and Constant Dictionary: The variables and their def- 
initions for the pseudocode presented are below. 

Constants 

NVARS The number of derived level ~3 geophysical vari- 
ables whose observational values are stored in the 
associated SUM.INPUT and SUMXX.INPUT pairs. 

Level -3 Input Variables 

NBINS The number of bins to read from an input level -3 
product. 

SUMX.INPUT A real*4 1-D array of size NVARS; represents SUMX 
as output by the space or time binner for all 
level -3 geophysical variables (Js) of a given bin 
IDX being read. 

SUMXX.INPUT A real*4 TD array of size NVARS; represents SUMXX 
as output by the space or time binner for all 
level - 3 geophysical variables (Js) of a given bin 
IDX being read. 
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N.INPUT 

NSEG.INPUT 

W.INPUT 

TT.INPUT 

IDX 


XMEAN 

SIGMA 

XMEDN 

XMODE 

N.INPUT 

NSEG.INPUT 

TT_INPUT 

IDX 

B 

J 

AVLGGS 

VRLOGS 


An integer*2 word; represents N as output by 
the space or time binner for a given bin IDX being 
read. 

An integer*2 word; represents NSEG as output 
by the space or time binner for a given bin IDX 
being read. 

A real word; represents W as output by the space 
or time binner for a given bin IDX being read. 

An integer*2 word; represents TT as output by 
the space or time binner for a given bin IDX being 
read. 

An integer *4 word representing the index num- 
ber of the bin being read from the input level -3 
product. 

Output Variables 

A real*4 1-D array of size NVARS; represents the 
mean of the weighted cumulative values of the 
level -3 geophysical variables (Js). 

A real*4 1-D array of size NVARS; represents the 
standard deviation for the weighted cumulative 
values of the level -3 geophysical variables (Js). 

A real *4 1-D array of size NVARS; represents the 
median of the weighted cumulative values of the 
level -3 geophysical variables (Js). 

A real*4 1-D array of size NVARS; represents the 
mode of the weighted cumulative values of the 
level -3 geophysical variables (Js). 

An integer *2 word; represents N as output by 
the space or time binner for a given bin IDX being 
output. 

An integer*2 word; represents NSEG as output 
by the space or time binner for a given bin IDX 
being output. 

An integer *2 word; represents TT as output by 
the space or time binner for a given bin IDX being 
output. 

An integer*4 word representing the index num- 
ber of the bin being output. 

Other Variables 

Counter index of bins read from input product. 
Range is from 1 to NBINS. 

Counter index of geophysical variables that have 
been binned. Range is from 1 to NVARS. 

A real *4 word that represents the mean of the 
weighted logs for a geophysical variable J of bin B 
being processed. Used to calculate XMEAN, SIGMA, 
XMEDN, and XMODE. 

A real*4 word that represents the variance of the 
weighted logs for a geophysical variable J of bin B 
being processed. Used to calculate XMEAN, SIGMA, 
and XMODE. 


# 

# Input information for each bin 

# 

read from metadata of binned input products: NBINS 
do from B=1 to NBINS 
read from bin B: 

IDX, N.INPUT, NSEG.INPUT, W.INPUT, TT.INPUT 
SUMX.INPUT(J), SUMXX.INPUT(J) , for J=1 to NVARS 
end read 

# 

# Calc, mean, std.dev., median and mode, and then output 

# 

do from J-l to NVARS 

AVLOGS = SUMX.INPUT(J) /W.INPUT 

VRLOGS = (SUMXX.INPUT(J) /W.INPUT) - (AVL0GS*AVL0GS) 
XMEAN (J) = exponential (AVLOGS + (VRLOGS/2.)) 

SIGMA(J) = XMEAN ( J ) * sqroot (exponential (VRLOGS) - 1) 
XMEDN (J) = exponent ial (AVLOGS) 

XMODE(J) «= exponential (AVLOGS - VRLOGS) 
end do 

write to screen or file useful info for bin IDX 
IDX, N.INPUT, NSEG.INPUT, TT.INPUT 

XMEAN(J) , SIGMA(J), XMEDN(J) , XMODE(J), for J-i to NVARS 
end write 
end do 


AVHRR 

CZCS 

DSP 

GAC 

GMT 

HRPT 

IFOV 

ISCCP 

LAC 

MARMAP 

MODIS 

RSMAS 


SeaWiFS 

SEEP 

SST 

TDI 


Ag 
A r 
AVG 
AVG4 

b 

B 9 

Br 


(Chl)tot 

CHL 

CHLis 


CHL 23 


Glossary 

Advanced Very High Resolution Radiometer 
Coastal Zone Color Scanner 

Not an acronym; the name of a software package 
developed at RSMAS. 

Global Area Coverage 
Greenwich Mean Time 

High Resolution Picture Transmission 

Instantaneous Field-of-View 

International Satellite Cloud Climatology Project 

Local Area Coverage 

Marine Resources Monitoring, Assessment, and Pre- 
diction 

Moderate Resolution Imaging Spectroradiometer 

Rosenstiel School of Marine and Atmospheric Sci- 
ence 

Sea-viewing Wide Field-of-view Sensor 
Shelf Edge Exchange Program 
Sea Surface Temperature 

Time Delay and Integration 

Symbols 

CZCS pigment algorithm constant (global). 

CZCS pigment algorithm constant (regional). 
Arithmetic average based on LAC data. 

Arithmetic average based on GAC data. 

Bin index number. 

CZCS pigment algorithm constant (global). 

CZCS pigment algorithm constant (regional). 

Integral euphotic chlorophyll. 

Chlorophyll concentration. 

Pigment concentration calculated from CZCS bands 

1 and 3. 

Pigment concentration calculated from CZCS bands 

2 and 3. 
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DIFF1 Relative difference between MLE4 and AVG4. 
DIFF2 Relative difference between MED4 and AVG4. 

E[X] Expected value of x. 

ERROR Relative error, in percent, of the estimated mean 
from the arithmetic mean. 

FNC Function of vector variable X using LAC data. 
FNC4 Function of vector variable X using GAC data. 

ICk Integrated chlorophyll concentration over the first 
optical depth. 

ID Mooring identification number. 

i^49o The diffuse attenuation coefficient at 490 nm. 

L Bin dimension in kilometers. 

Lwn{ At) Normalized water-leaving radiances in i bands (1- 

5 ). 

L a (A;) Atmospheric aerosol radiances in i bands (6-8). 

Lw Water-leaving radiance. 

mi Central moment of x. 

m x Sample mean of the natural logarithm of x. 
m y Sample mean of the natural logarithm of y. 
nir Sample mean of regional In (pigment). 

MLE Maximum likelihood estimator of LAC data. 

MLE4 Maximum likelihood estimator of GAC data. 

MED Geometric mean or median of LAC data. 

MED4 Geometric mean or median of GAC data. 

n Sample size. 

rii The number of pixels per bin on orbit i. 
n The number of days used for temporal averaging. 

N The number of orbits contributing to the temporal 
mean. 

P The proportion of the distribution that is mode 1. 
PIG CZCS pigment-like concentration. 

PIG r Pigment calculated with regionally-derived param- 
eters. 

s?. The sample variance of regional In (pigment). 
s'i The sample variance of the natural logarithm of x. 
Sy The sample variance of the natural logarithm of y . 
S\ The weighted sum of variable x. 

S 2 The weighted sum of variable y. 

SD X The standard deviation of x. 

SD y The standard deviation of y . 

t* The time at which orbit i was acquired. 

T A 16-bit time distribution variable. 

V The 8-bit image value of a pixel, i.e., gray level. 

Wi The weight factor for orbit i. 

W The sum of the weighting factors. 

x The natural logarithm of X. 

X Any random variable whose distribution is unknown. 
A A level -2 variable. 

A The true mean of a level -2 variable. 

X aV g The arithmetic average of X. 

Ageom The geometric mean of X . 

JCmie The maximum likelihood estimator of A. 

Amed The median of A. 

Amod The mode of A. 

Aest The estimated mean of X. 

X The vector of standard level -2 variables. 


Y Any function of a level -2 variable X. 

__ Y The true mean of Y. 

Tavg The arithmetic average of Y. 

y_mie The maximum likelihood estimator of Y. 

Y f nc The arithmetic mean of FNC. 

Z e The euphotic depth (depth to 1% light level). 

Ai Wavelength of 440 nm. 

A2 Wavelength of 520 nm. 

A3 Wavelength of 550 nm. 

r a (865) The aerosol optical thickness at 865 nm. 
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COLOR PLATES 
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PLATE 1. Mean CHL images derived from the AVG and MLE estimators for the seven CZCS scenes listed in Table 1. 
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PLATE 2. Mean CHL images derived from the AVG4 and MLE4 estimators for the seven CZCS scenes listed in Table 1. 
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PLATE 3. Differences between level -3 means derived from MLE and AVG estimators (upper images) and difference 
between MLE4 and AVG4 estimates (lower images). 
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